Edited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 81–100
We present several Verb + Noun collocation integration methods using linguistic information, aiming to improve the results of a French-Romanian factored statistical machine translation system (FSMT). The system uses lemmatised, tagged and sentence-aligned legal parallel corpora. Verb + Noun collocations are frequent word associations, sometimes discontinuous, related by syntactic links and with non-compositional sense (Gledhill, 2007). Our first strategy extracts collocations from monolingual corpora, using a hybrid method which combines morphosyntactic properties and frequency criteria. The second method applies a bilingual collocation dictionary to identify collocations. Both methods transform collocations into single tokens before alignment. The third method applies a specific alignment algorithm for collocations. We evaluate the influence of these collocation alignment methods on the results of the lexical alignment and of the FSMT system.