Last update:
9 February 2010
|
Article details
Hybrid models for sense guessing of Chinese unknown words
Xiaofei Lu,
The Pennsylvania State University
This paper addresses the problem of classifying Chinese unknown words into fine-grained semantic categories defined in a Chinese thesaurus, Cilin (Mei et al. 1984). We present three novel knowledge-based models that capture the relationship between the semantic categories of an unknown word and those of its component characters in three different ways, and combine two of them with a corpus-based model that uses contextual information to classify unknown words. Experiments show that the combined knowledge-based model outperforms previous methods on the same task, but the use of contextual information does not further improve performance.
Keywords: Chinese unknown words, corpus annotation, corpus-based models, knowledge-based models, lexical acquisition, sense tagging
In: International Journal of Corpus Linguistics 13:1. 2008. 144 pp. (pp. 99–128) |