Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience

Publication date

2025-06

Authors

Moons, FilipORCID 0000-0002-5368-3429ISNI 0000000512553144
Vandervieren, Ellen
Colpaert, Jozef

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

Assessing exams with multiple assessors is challenging regarding inter-rater reliability and feedback. This paper presents ‘checkbox grading,’ a digital method where exam designers have predefined checkboxes with both feedback and associated partial grades. Assessors then tick the checkboxes relevant to a student solution. Dependencies between checkboxes ensure consistency among assessors in following the grading scheme. Moreover, the approach supports ‘blind grading’ by hiding the grades associated with the checkboxes, thus focusing assessors on the criteria rather than the scores. The approach was studied during a large-scale mathematics state exam. Results show that assessors perceived checkbox grading as very useful. However, compared to traditional grading—where assessors follow a correction scheme and communicate the resulting grade—more time is spent on checkbox grading, while both approaches are equally reliable. Blind grading improved inter-rater reliability for some tasks. Overall, checkbox grading might lead to a smoother process where feedback, not solely grades, is communicated to students.

Keywords

Assessment, Computer-assisted assessment, Feedback, Inter-rater reliability, State examinations, Education

Citation

Moons, F, Vandervieren, E & Colpaert, J 2025, 'Checkbox grading of large-scale mathematics exams with multiple assessors: Field study on assessors’ inter-rater reliability, time investment and usage experience', Studies in Educational Evaluation, vol. 85, 101443. https://doi.org/10.1016/j.stueduc.2024.101443