How a simple increase in the number of items can enhance the reliability of linguistic judgments: The case of island experiments

Publication date

2025-10-31

Authors

Schoenmakers, Gert-JanORCID 0000-0002-0666-6001ISNI 0000000506843479

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

Replication is an important aspect of experimental research and it is therefore crucial that participant-level measures (e.g., judgment scores) are reliable. Reliability refers to the precision of measurement and thus informs the replicability of experiments: more precise measurements are more dependable for future reference. Formally defined as the ratio of true score variance to the total variance, reliability can be achieved by fine-tuning the measurement instrument or by collecting a sufficiently large number of observations per participant, as averaging over more items reduces the influence of random item specific noise and yields a more precise estimate of participants’ true scores. The present paper uses Generalizability Theory to estimate the reliability of participant scores in 52 distinct datasets from studies that used comparable experimental designs to investigate different types of island effects in different languages. Effect sizes (DD-scores) are commonly reported and used for comparative purposes in discussions on island effects. The present paper argues that caution is warranted when island effect sizes are compared: the analyses reveal that participant-level reliability in island experiments is moderate, but that increasing the number of items to six per condition enhances measurement precision.

Keywords

experimental syntax, island effects, judgment data, reliability, replication

Citation

Schoenmakers, G-J 2025, 'How a simple increase in the number of items can enhance the reliability of linguistic judgments : The case of island experiments', Languages, vol. 10, no. 11, 277. https://doi.org/10.3390/languages10110277