Finding Dutch multiword expressions

Publication date

2023-10-16

Authors

Odijk, JanISNI 0000000397024625
Kroon, MartinISNI 0000000502677127
Baarda, Tijmen C.ORCID 0000-0002-2577-4948ISNI 0000000493371926
Bonfil, Ben
Spoel, Sheean

Editors

Lindén, Krister
Niemi, Jyrki
Kontino, Thalassia

Advisors

Supervisors

DOI

Document Type

Part of book
Open Access logo

License

taverne

Abstract

We present MWE-Finder, which enables a user to search for occurrences of multiword expressions (MWEs) in large Dutch text corpora. Components of many MWEs in Dutch can occur in multiple forms, need not be adjacent, and can occur in multiple orders (such MWEs are called flexible). Searching for occurrences of such flexible MWEs is difficult and cannot be done reliably with most search applications. What is needed is a search engine that takes into account the grammatical configuration of the MWE. MWE-Finder is therefore embedded in GrETEL, a treebank search application for Dutch. A user can enter an example of a MWE in a specific canonical form, after which the system searches for sentences in which the MWE occurs, using queries generated automatically from the canonical form. The MWE can also be selected from a list of more than 11k canonical forms for Dutch MWEs that MWE-Finder offers. We will show that MWE-Finder also offers facilities to find examples with unexpected modifiers or determiners on components of the MWE

Keywords

Multiword Expressions, GrETEL, linguistic research infrastructure, Dutch, treebanks, Language and Linguistics, Artificial Intelligence

Citation

Odijk, J, Kroon, M, Baarda, T C, Bonfil, B & Spoel, S 2023, Finding Dutch multiword expressions. in K Lindén, J Niemi & T Kontino (eds), CLARIN Annual Conference Proceedings 2023. CLARIN Annual Conference Proceedings, vol. 2023, CLARIN ERIC, Utrecht, pp. 85-89.