Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Lin, Pin-Jie; Saeed, Muhammed; Chang, Ernie; Scholman, Merel

doi:https://doi.org/10.21437/Interspeech.2023-466

Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Files

Wed-P6.12.pdf (402.74 KB)

Publication date

2023-09

Authors

Lin, Pin-Jie

Saeed, Muhammed

Chang, Ernie

Scholman, Merel

DOI

https://doi.org/10.21437/Interspeech.2023-466

Document Type

Part of book

Metadata

Show full item record

Collections

Utrecht University Repository

License

taverne

Abstract

Developing effective spoken language processing systems for low-resource languages poses several challenges due to the lack of parallel data and limited resources for fine-tuning models. In this work, we target on improving upon both text classification and translation of Nigerian Pidgin (Naija) by collecting a large-scale parallel English-Pidgin corpus and further propose a framework of cross-lingual adaptive training that includes both continual and task adaptive training so as to adapt a base pre-trained model to low-resource languages. Our studies show that English pre-trained language models serve as a stronger prior than multilingual language models on English-Pidgin tasks with up to 2.38 BLEU improvements; and demonstrate that augmenting orthographic data and using task adaptive training with back-translation can have a significant impact on model performance.

Keywords

low-resource language, low-resource machine translation, spoken language understanding, Taverne, Software, Signal Processing, Language and Linguistics, Human-Computer Interaction, Modelling and Simulation

Citation

Lin, P-J, Saeed, M, Chang, E & Scholman, M 2023, Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin. in Proceedings of the 24th INTERSPEECH conference. vol. 2023-August, Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp. 3954-3958. https://doi.org/10.21437/Interspeech.2023-466

URI

http://hdl.handle.net/1874/436353

Low-Resource Cross-Lingual Adaptive Training for Nigerian Pidgin

Files

Publication date

Authors

Editors

Advisors

Supervisors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI