Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions

Le Bruyn, Bert; Fuchs, Martin; van der Klis, Martijn; Liu, Jianan; Mo, Chou; Tellings, Jos; de Swart, Henriette

doi:https://doi.org/10.3390/languages7030176

Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions

Files

languages_07_00176_v3.pdf (1.37 MB)

Publication date

2022-09

Authors

Le Bruyn, B.S.W.

Fuchs, Martin

van der Klis, Martijn

Liu, Jianan

Mo, Chou

Tellings, Jos

de Swart, Henriette

DOI

https://doi.org/10.3390/languages7030176

Document Type

Article

Metadata

Show full item record

Collections

Utrecht University Repository

License

cc_by

Abstract

This paper surveys the strategies that the Contrastive, Typological, and Translation Mining parallel corpus traditions rely on to deal with the issue of target language representativeness of translations. On the basis of a comparison of the corpus architectures and research designs of the three traditions, we argue that they have each developed their own representativeness strategies: (i) monolingual control corpora (Contrastive tradition), (ii) limits on the scope of research questions (Typological tradition), and (iii) parallel control corpora (Translation Mining tradition). We introduce normalized pointwise mutual information (NPMI) as a bi-directional measure of cross-linguistic association, allowing for an easy comparison of the outcomes of different traditions and the impact of the monolingual and parallel control corpus representativeness strategies. We further argue that corpus size has a major impact on the reliability of the monolingual control corpus strategy and that a sequential parallel control corpus strategy is preferable for smaller corpora.

Keywords

cross-linguistic variation, parallel corpora, translation, Language and Linguistics, Linguistics and Language

Citation

Le Bruyn, B, Fuchs, M, van der Klis, M, Liu, J, Mo, C, Tellings, J & de Swart, H 2022, 'Parallel Corpus Research and Target Language Representativeness : The Contrastive, Typological, and Translation Mining Traditions', Languages, vol. 7, no. 3, 176. https://doi.org/10.3390/languages7030176

URI

https://dspace.library.uu.nl/handle/1874/426568

Parallel Corpus Research and Target Language Representativeness: The Contrastive, Typological, and Translation Mining Traditions

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI