Catalog Search
 
Advanced Search

My shopping cart cart icon
Your cart is empty

My wish list wishlist icon
Your wish list is empty



Last update:
9 February 2010

© John Benjamins
Home

Article details

Hybrid models for sense guessing of Chinese unknown words

Xiaofei Lu, The Pennsylvania State University

This paper addresses the problem of classifying Chinese unknown words into fine-grained semantic categories defined in a Chinese thesaurus, Cilin (Mei et al. 1984). We present three novel knowledge-based models that capture the relationship between the semantic categories of an unknown word and those of its component characters in three different ways, and combine two of them with a corpus-based model that uses contextual information to classify unknown words. Experiments show that the combined knowledge-based model outperforms previous methods on the same task, but the use of contextual information does not further improve performance.

Keywords: Chinese unknown words, corpus annotation, corpus-based models, knowledge-based models, lexical acquisition, sense tagging

In: International Journal of Corpus Linguistics 13:1. 2008. 144 pp. (pp. 99–128)