Improving the lexical coverage of English compound adjectives : Improving the lexical coverage of English compound adjectives in syntactic parsing

Publication date

2008-11

Authors

Oostdijk, Nelleke

Editors

Advisors

Supervisors

DOI

Document Type

Part of book or chapter of book

Collections

Open Access logo

License

Abstract

The present paper addresses the question how in syntactic parsing the coverage of words in previously unseen text may be improved. The adjectives in English are presented here as a case study. Working on the assumption that most new words that are introduced into the language are constructed on the basis of already existing words through the application of word-formation processes, we investigate the role that different word-formation processes play, more specifically in the formation of adjectives in English. An analysis of adjectives in the BNC shows that in the case of adjectives compounding is the word-formation process that is most productive. Moreover, compound adjectives are not formed by combining bases at will; rather, a limited set of fairly simple rules apply that restrict the co-occurrence of bases. This makes it feasible to develop an approach for handling compound adjectives which is rather effective, as is evident from the results from a first implementation where of a set of 30,561 compound adjectives derived from the BNC, 88.68% were correctly identified as such. Incorporation of the rules in the grammar underlying the Pelican parser accounts for a 7.65% increase in the parser’s coverage of a subset of 10,123 sentences taken from the Leipzig corpus.

Keywords

Citation