DiscoNaija: a discourse-annotated parallel Nigerian Pidgin-English corpus
Publication date
2025-12
Editors
Advisors
Supervisors
Document Type
Article
Metadata
Show full item recordCollections
License
cc_by
Abstract
This article presents a parallel English-Nigerian Pidgin corpus of PTB 3.0-style discourse relation annotations, named DiscoNaija. We explain the corpus design criteria, report inter-annotator agreement, and alignment and projection evaluations. We also present an update to a Nigerian Pidgin connective lexicon, named NaijaLex 2.0. An exploratory corpus analysis focused on comparing the distributions found in DiscoNaija to those found in PDTB 3.0 and a comparable corpus of English, DiscoSPICE. We identify various features of Nigerian Pidgin discourse coherence: (i) relations tend to be expressed implicitly more often in Nigerian Pidgin in general; (ii) anti-chronological temporal relations tend to be expressed less and are more likely to be expressed explicitly in Nigerian Pidgin; and (iii) coordinating conjunctions occur less frequently in Nigerian Pidgin than in English. The DiscoNaija corpus can facilitate a multitude of applications and research purposes, for example to function as training data to improve the performance of discourse relation parsers for Nigerian Pidgin, and to facilitate research into discourse features of creole languages.
Keywords
Cross-linguistic comparison, Discourse relations, Nigerian Pidgin, Parallel corpus, Language and Linguistics, Education, Linguistics and Language, Library and Information Sciences
Citation
Scholman, M C J, Marchal, M, Brown, A & Demberg, V 2025, 'DiscoNaija: a discourse-annotated parallel Nigerian Pidgin-English corpus', Language Resources and Evaluation, vol. 59, no. 4, pp. 3597-3633. https://doi.org/10.1007/s10579-025-09850-3