DiscoNaija: a discourse-annotated parallel Nigerian Pidgin-English corpus

Publication date

2025-12

Authors

Scholman, Merel C. J.ORCID 0000-0002-0223-8464ISNI 0000000526456599
Marchal, MarianISNI 0000000512552512
Brown, AriaRay
Demberg, Vera

Editors

Advisors

Supervisors

Document Type

Article
Open Access logo

License

cc_by

Abstract

This article presents a parallel English-Nigerian Pidgin corpus of PTB 3.0-style discourse relation annotations, named DiscoNaija. We explain the corpus design criteria, report inter-annotator agreement, and alignment and projection evaluations. We also present an update to a Nigerian Pidgin connective lexicon, named NaijaLex 2.0. An exploratory corpus analysis focused on comparing the distributions found in DiscoNaija to those found in PDTB 3.0 and a comparable corpus of English, DiscoSPICE. We identify various features of Nigerian Pidgin discourse coherence: (i) relations tend to be expressed implicitly more often in Nigerian Pidgin in general; (ii) anti-chronological temporal relations tend to be expressed less and are more likely to be expressed explicitly in Nigerian Pidgin; and (iii) coordinating conjunctions occur less frequently in Nigerian Pidgin than in English. The DiscoNaija corpus can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of discourse relation parsers for Nigerian Pidgin, and to facilitate research into discourse features of creole languages.

Keywords

Cross-linguistic comparison, Discourse relations, Nigerian Pidgin, Parallel corpus, Language and Linguistics, Education, Linguistics and Language, Library and Information Sciences

Citation

Scholman, M C J, Marchal, M, Brown, A & Demberg, V 2025, 'DiscoNaija: a discourse-annotated parallel Nigerian Pidgin-English corpus', Language Resources and Evaluation, vol. 59, no. 4, pp. 3597-3633. https://doi.org/10.1007/s10579-025-09850-3