Comparison of applying Pair HMMs and DBN models in Transliteration Identification
Files
Publication date
2010-11
Authors
Nabende, Peter
Editors
Advisors
Supervisors
DOI
Document Type
Part of book or chapter of book
Metadata
Show full item recordCollections
License
Abstract
Transliteration is aimed at dealing with unknown words in Cross Language Information Retrieval
(CLIR) and Machine Translation (MT). Most of the transliteration tasks depend on a
similarity estimation stage where a model is utilized with the aim of identifying a transliteration
match for a given source word. In this paper, we evaluate the application of two
related frameworks to transliteration identification. Both frameworks model string similarity
as the cost incurred through a series of edit operations. One framework implements Pair
Hidden Markov Models (Pair HMMs) (Mackay and Kondrak 2005) while the other implements
classes of Dynamic Bayesian Network (DBN) models (Filali and Bilmes 2005). For
each Pair HMM, we adapt different algorithms for computing transliteration similarity estimates.
For the DBN framework, we modify the DBN classes in (Filali and Bilmes 2005)
and specify models from the classes to represent factorizations that we hypothesize could
affect the value of a transliteration similarity estimate. Separate tests applying models from
the two frameworks result in high transliteration identification accuracy on an experimental
setup of Russian-English transliteration. A check on the output from models associated
with the two frameworks suggests that there can be improved transliteration identification
accuracy through a combination of models.