Analysing linguistic information about word combinations for a Spanish-Basque rule-based machine translation
system
This paper describes an in-depth analysis of noun + verb combinations in Spanish-Basque translations. Firstly, we
examined noun + verb constructions in the dictionary, and confirmed that this kind of MWU varies considerably from
language to language, which justifies the need for their specific treatment in MT systems. Then, we searched for those
combinations in a parallel corpus, and we selected the most frequently-occurring ones to analyse them further and
classify them according to their level of syntactic fixedness and semantic compositionality. We tested whether adding
linguistic data relevant to MWUs improved the detection of Spanish combinations, and we found that, indeed, the number
of MWUs identified increased by 30.30% with a precision of 97.61%. Finally, we also evaluated how an RBMT system
translated the MWUs we analysed, and concluded that at least 44.44% needed to be corrected or improved.
Article outline
- 1.Introduction
- 2.Definitions, challenges and treatment of MWUs in MT
- 3.Linguistic analysis of Basque and Spanish noun + verb combinations
- 3.1Noun + verb combinations in bilingual dictionaries
- 3.1.1
Basque and Spanish noun + verb combinations in the dictionary
- 3.1.2Translations of noun + verb combinations in the dictionary
- 3.1.3Equivalences of noun + verb constructions in translations
- 3.2Contrasting information with parallel corpora
- 3.3Classification of the Spanish MWUs
- 3.3.1Syntactic flexibility
- 3.3.2Semantic compositionality
- 4.Evaluation of MWU detection and translation adequacy
- 4.1Evaluation of MWU detection
- 4.2Evaluation of MWU translation quality in an RBMT system
- 5.Conclusions and future work
-
Acknowledgements
-
Notes
-
Bibliography