Similarity Rules! Exploring Methods for Ad-Hoc Rule Detection
Files
Publication date
2008-11
Authors
Dickinson, Markus
Foster, Jennifer
Editors
Advisors
Supervisors
DOI
Document Type
Part of book or chapter of book
Metadata
Show full item recordCollections
License
Abstract
"One problem facing the extraction of treebank grammars is that of ad hoc rules,
rules used for constructions specific to one data set and unlikely to be used on new
data (Dickinson, 2008). These rules can be erroneous, cover ungrammatical text, or
reveal issues with the treebank’s annotation scheme. These are significant problems
since training on erroneous data can be detrimental to parsing performance (e.g.,
Dickinson and Meurers, 2005; Hogan, 2007), and the use of precision grammars in
grammar checking and generation requires distinctions between grammatical and
ungrammatical sentences (e.g., Bender et al., 2004). Ad hoc rules are especially
problematic when they point to inconsistent aspects of the annotation scheme, as
the scheme forms the basis of any analysis using it"