TY - JOUR
T1 - Factors affecting inter-rater agreement in human classification of eye movements
T2 - a comparison of three datasets
AU - Friedman, Lee
AU - Prokopenko, Vladyslav
AU - Djanian, Shagen
AU - Katrychuk, Dmytro
AU - Komogortsev, Oleg V
N1 - © 2022. The Psychonomic Society, Inc.
PY - 2023/1
Y1 - 2023/1
N2 - Manual classification of eye-movements is used in research and as a basis for comparison with automatic algorithms in the development phase. However, human classification will not be useful if it is unreliable and unrepeatable. Therefore, it is important to know what factors might influence and enhance the accuracy and reliability of human classification of eye-movements. In this report we compare three datasets of human manual classification, two from earlier datasets and one, our own dataset, which we present here for the first time. For inter-rater reliability, we assess both the event-level F1-score and sample-level Cohen's κ, across groups of raters. The report points to several possible influences on human classification reliability: eye-tracker quality, use of head restraint, characteristics of the recorded subjects, the availability of detailed scoring rules, and the characteristics and training of the raters.
AB - Manual classification of eye-movements is used in research and as a basis for comparison with automatic algorithms in the development phase. However, human classification will not be useful if it is unreliable and unrepeatable. Therefore, it is important to know what factors might influence and enhance the accuracy and reliability of human classification of eye-movements. In this report we compare three datasets of human manual classification, two from earlier datasets and one, our own dataset, which we present here for the first time. For inter-rater reliability, we assess both the event-level F1-score and sample-level Cohen's κ, across groups of raters. The report points to several possible influences on human classification reliability: eye-tracker quality, use of head restraint, characteristics of the recorded subjects, the availability of detailed scoring rules, and the characteristics and training of the raters.
KW - Cohen’s Kappa
KW - Event-level agreement
KW - Eye-movements
KW - F1-score
KW - Manual classification
KW - Sample-level agreement
UR - http://www.scopus.com/inward/record.url?scp=85128064202&partnerID=8YFLogxK
U2 - 10.3758/s13428-021-01782-4
DO - 10.3758/s13428-021-01782-4
M3 - Journal article
C2 - 35411475
SN - 1554-351X
VL - 55
SP - 417
EP - 427
JO - Behavior Research Methods
JF - Behavior Research Methods
IS - 1
ER -