The CLIN27 Shared Task: Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation

Publication date

2017-12

Authors

Tjong Kim Sang, Erik
Bollman, Marcel
Boschker, Remko
Casacuberta, Francisco
Dietz, FeikeISNI 0000000398613253
Dipper, Stefanie
Domingo, Miguel
van der Goot, Rob
Van Koppen, MarjoISNI 000000011038355X
Ljubešić, Nikola

Editors

Advisors

Supervisors

DOI

Document Type

Article
Open Access logo

License

Abstract

The CLIN27 shared task evaluates the effect of translating historical text to modern text with the goal of improving the quality of the output of contemporary natural language processing tools applied to the text. We focus on improving part-of-speech tagging analysis of seventeenth-century Dutch. Eight teams took part in the shared task. The best results were obtained by teams employing character-based machine translation. The best system obtained an error reduction of 51% in comparison with the baseline of tagging unmodified text. This is close to the error reduction obtained by human translation (57%).

Keywords

historical text, text normalization, neural networks, machine translation, dutch language

Citation

Tjong Kim Sang, E, Bollman, M, Boschker, R, Casacuberta, F, Dietz, F M, Dipper, S, Domingo, M, van der Goot, R, van Koppen, J M, Ljubešić, N, Östling, R, Petran, F, Pettersson, E, Scherrer, Y, Schraagen, M P, Sevens, L, Tiedeman, J, Vanallemeersch, T & Zervanou, K 2017, 'The CLIN27 Shared Task : Translating Historical Text to Contemporary Language for Improving Automatic Linguistic Annotation', Computational Linguistics in the Netherlands Journal, vol. 7, pp. 53-64. < https://clinjournal.org/clinj/article/view/68 >