Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience

Moons, Filip; Vandervieren, Ellen; Colpaert, Jozef

doi:https://doi.org/10.1016/j.stueduc.2024.101443

Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience

Files

1-s2.0-S0191491X24001299-main.pdf (4.92 MB)

Publication date

2025-06

Authors

Moons, Filip

Vandervieren, Ellen

Colpaert, Jozef

DOI

https://doi.org/10.1016/j.stueduc.2024.101443

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

Assessing exams with multiple assessors is challenging regarding inter-rater reliability and feedback. This paper presents ‘checkbox grading,’ a digital method where exam designers have predefined checkboxes with both feedback and associated partial grades. Assessors then tick the checkboxes relevant to a student solution. Dependencies between checkboxes ensure consistency among assessors in following the grading scheme. Moreover, the approach supports ‘blind grading’ by hiding the grades associated with the checkboxes, thus focusing assessors on the criteria rather than the scores. The approach was studied during a large-scale mathematics state exam. Results show that assessors perceived checkbox grading as very useful. However, compared to traditional grading—where assessors follow a correction scheme and communicate the resulting grade—more time is spent on checkbox grading, while both approaches are equally reliable. Blind grading improved inter-rater reliability for some tasks. Overall, checkbox grading might lead to a smoother process where feedback, not solely grades, is communicated to students.

Keywords

Assessment, Computer-assisted assessment, Feedback, Inter-rater reliability, State examinations, Education

Citation

Moons, F, Vandervieren, E & Colpaert, J 2025, 'Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience', Studies in Educational Evaluation, vol. 85, 101443. https://doi.org/10.1016/j.stueduc.2024.101443

URI

https://dspace.library.uu.nl/handle/1874/475045

Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI