Challenges in Reproducing Human Evaluation Results for Role-Oriented Dialogue Summarization

Ito, Takumi; Fang, Qixiang; Mosteiro Romero, Pablo; Gatt, Albert; van Deemter, Kees

Challenges in Reproducing Human Evaluation Results for Role-Oriented Dialogue Summarization

Files

2023.humeval-1.9.pdf (301.86 KB)

Publication date

2023-08-15

Authors

Ito, Takumi

Fang, Qixiang

Mosteiro Romero, Pablo

Gatt, Albert

van Deemter, Kees

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

There is a growing concern regarding the reproducibility of human evaluation studies in NLP. As part of the ReproHum campaign, we conducted a study to assess the reproducibility of a recent human evaluation study in NLP. Specifically, we attempted to reproduce a human evaluation of a novel approach to enhance Role-Oriented Dialogue Summarization by considering the influence of role interactions. Despite our best efforts to adhere to the reported setup, we were unable to reproduce the statistical results as presented in the original paper. While no contradictory evidence was found, our study raises questions about the validity of the reported statistical significance results, and/or the comprehensiveness with which the original study was reported. In this paper, we provide a comprehensive account of our reproduction study, detailing the methodologies employed, data collection, and analysis procedures. We discuss the implications of our findings for the broader issue of reproducibility in NLP research. Our findings serve as a cautionary reminder of the challenges in conducting reproducible human evaluations and prompt further discussions within the NLP community.

Citation

Ito, T, Fang, Q, Mosteiro Romero, P, Gatt, A & van Deemter, K 2023, Challenges in Reproducing Human Evaluation Results for Role-Oriented Dialogue Summarization. in The 3rd Workshop on Human Evaluation of NLP Systems (HumEval’23). Association for Computational Linguistics. < https://aclanthology.org/2023.humeval-1.9 >

URI

http://hdl.handle.net/1874/436320

Challenges in Reproducing Human Evaluation Results for Role-Oriented Dialogue Summarization

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI