Improving the lexical coverage of English compound adjectives : Improving the lexical coverage of English compound adjectives in syntactic parsing
Files
Publication date
2008-11
Authors
Oostdijk, Nelleke
Editors
Advisors
Supervisors
DOI
Document Type
Part of book or chapter of book
Metadata
Show full item recordCollections
License
Abstract
The present paper addresses the question how in syntactic parsing the coverage of words in
previously unseen text may be improved. The adjectives in English are presented here as
a case study. Working on the assumption that most new words that are introduced into the
language are constructed on the basis of already existing words through the application of
word-formation processes, we investigate the role that different word-formation processes
play, more specifically in the formation of adjectives in English. An analysis of adjectives
in the BNC shows that in the case of adjectives compounding is the word-formation process
that is most productive. Moreover, compound adjectives are not formed by combining
bases at will; rather, a limited set of fairly simple rules apply that restrict the co-occurrence
of bases. This makes it feasible to develop an approach for handling compound adjectives
which is rather effective, as is evident from the results from a first implementation where of a
set of 30,561 compound adjectives derived from the BNC, 88.68% were correctly identified as such. Incorporation of the rules in the grammar underlying the Pelican parser accounts for a 7.65% increase in the parser’s coverage of a subset of 10,123 sentences taken from the Leipzig corpus.