Multiword units in machine translation and translation technology

Monti, Johanna; Seretan, Violeta; Corpas Pastor, Gloria; Mitkov, Ruslan

doi:10.1075/cilt.341.01mon

Part of

Multiword Units in Machine Translation and Translation Technology
Edited by Ruslan Mitkov, Johanna Monti, Gloria Corpas Pastor and Violeta Seretan
[Current Issues in Linguistic Theory 341] 2018
► pp. 1–38

Multiword units in machine translation and translation technology

Johanna Monti

Violeta Seretan

Gloria Corpas Pastor

Ruslan Mitkov

The correct interpretation of Multiword Units (MWUs) is crucial to many applications in Natural Language Processing but is a challenging and complex task. In recent years, the computational treatment of MWUs has received considerable attention but we believe that there is much more to be done before we can claim that NLP and Machine Translation (MT) systems process MWUs successfully. In this chapter, we present a survey of the field with particular reference to Machine Translation and Translation Technology.

Keywords: multiword units, multiword expressions, natural language processing, machine translation, translation technology

Article outline

1.Introduction
2.Multiword units in natural language processing
- 2.1Historical notes
- 2.2POS tagging and parsing
- 2.3Word sense disambiguation
- 2.4Information extraction and information retrieval
- 2.5Other applications
3.Multiword unit processing in machine translation
- 3.1Historical notes
- 3.2Multiword unit processing in RBMT
- 3.3Multiword unit processing in EBMT
- 3.4Multiword unit processing in SMT
4.Multiword units in translation technology
Acknowledgements
References
Notes

Published online: 20 July 2018

https://doi.org/10.1075/cilt.341.01mon

References (192)

References

Abeillé, A., Clément, L., & Toussenel, F. (2003). Building a treebank for French. In Abeillé(Ed.) Treebanks (pp.165–187). Dordrecht: Kluwer.

Acosta, O., Villavicencio, A., & Moreira, V. (2011). Identification and treatment of multiword expressions applied to Information Retrieval. In Proceedings of the workshop on multiword expressions: From parsing and generation to the real world (pp.101–109). Portland, Oregon, USA.

Alegría, I., Ansa, O., Artola, X., Ezeiza, N., Gojenola, K., & Urizar, R. (2004). Representation and treatment of multiword expressions in Basque. Second ACL workshop on multiword expressions: Integrating processing (pp.48–55). Barcelona, Spain.

Anastasiou, D. (2009). Idiom treatment experiments in machine translation (Unpublished doctoral dissertation). Saarland University.

(2010). Idiom treatment experiments in machine translation. Newcastle upon Tyne: Cambridge Scholars Publishing.

Arranz, V., Atserias, J., & Castillo, M. (2005). Multiwords and word sense disambiguation. In Proceedings Computational linguistics and intelligent text processing: 6th international conference, CICLING 2005, Mexico City, Mexico, February 13–19, 2005. (pp.250–262). Mexico city, Mexico.

Arnold, I.V. 1973. The English Word Moscow: Higher School Publishing House

Aziz, W., Dymetman, M., Mirkin, S., Specia, L., Cancedda, N., & Dagan, I. (2010). Learning an expert from human annotations in statistical machine translation: The case of out-of-vocabulary words. In Proceedings of the 14th annual meeting of the European Association for Machine Translation (EAMT) (pp.28–35). Saint-Rapha, France.

Baldwin, T. (2011). MWEs and topic modelling: Enhancing machine learning with linguistics. In Proceedings of the workshop on multiword expressions: From parsing and generation to the real world (p.1). Portland, Oregon, USA.

Baldwin, T., & Kim, S. N. (2010). Multiword expressions. In N. Indurkhya & F. J. Damerau, (Eds.), Handbook of Natural Language Processing, Second Edition (pp.267–292). Boca Raton, USA: Chapman and Hall/CRC (2010).

Bar-Hillel, Y. (1952). “The Treatment of ‘idioms’ by a Translating Machine”, presented at the Conference on Mechanical Translation at Massachusetts Institute of Technology, June 1952.

Barreiro, A., & Batista, F. (2016). Machine translation of non-contiguous multiword units. In Proceedings of Workshop on Discontinuous Structures in Natural Language Processing (DiscoNLP) (pp.22–30). San Diego, California, USA.

Barreiro, A., Monti, J., Orliac, B., Preuß, S., Arrieta, K., Ling, W., Batista, F. & Trancoso, I. (2014). Linguistic evaluation of support verb constructions by OpenLogos and Google Translate. In Proceedings of Ninth International Conference on Language Resources and Evaluation (LREC2014) (pp.35–40). Reykjavik, Island.

Barreiro, A., Raposo, F., & Luís, T. (2016). CLUE-Aligner: An alignment tool to annotate pairs of paraphrastic and translation units. In Proceedings of the LREC 2016 Workshop “Translation Evaluation: From Fragmented Tools and Data Sets to an Integrated Ecosystem” (pp.7–13). Portorož, Slovenia

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). Grammar of spoken and written English. Edimburgh: Pearson Education Limited.

Boonthum, C., Toida, S., & Levinstein, I. (2005). Sense disambiguation for preposition with . In Proceedings of the second ACL–SIGSEM workshop on the linguistic dimensions of prepositions and their use in computational linguistic formalisms and applications (pp.153–162). Colchester, United Kingdom.

Bouamor, D., Semmar, N., Zweigenbeaum, P., (2012), Automatic Construction of a MultiWord Expressions Bilingual Lexicon: A Statistical Machine Translation Evaluation Perspective, Proceedings of the 3rd Workshop on Cognitive Aspects of the Lexicon (CogALex-III), COLING 2012. (pp.95–108). Mumbai, India.

Bouamor, D., Semmar, N., & Zweigenbaum, P. (2011). Improved statistical machine translation using multiword expressions. In Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) (pp.15–20). Barcelona, Spain.

Boulaknadel, S., Daille, B., & Aboutajdine, D. (2008). A multi-word term extraction program for Arabic language. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08) (pp.1485–1488). Marrakech, Morocco.

Brooke, J., Hammond, A., Jacob, D., Tsang, V., Hirst, G., & Shein, F. (2015). Building a lexicon of formulaic language for language learners. In Proceedings of the 11th workshop on multiword expressions (pp.96–104). Denver, Colorado, USA.

Brown, P. F., Cocke, J., Pietra, S. A. D., Pietra, V. J. D., Jelinek, F., Lafferty, J. D., Mercer R. L. & Roossin, P. S. (1990). A statistical approach to machine translation. Computational linguistics, 16(2), 79–85.

Brown, P. F., Pietra, V. J. D., Pietra, S. A. D., & Mercer, R. L. (1993). The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2), 263–311.

Brown, P., Cocke, J., Pietra, S. D., Pietra, V. D., Jelinek, F., Mercer, R., & Roossin, P. (1988). A statistical approach to language translation. In Proceedings of the 12th conference on Computational linguistics, Volume 1, (pp.71–76). Budapest, Hungry.

Brun, C. (1998). Terminology finite-state preprocessing for computational LFG. In Proceedings of the 36th annual meeting of the association for computational linguistics and 17th international conference on computational linguistics (pp.196–200). Morristown, New Jersey, USA.

Burstein, J. (2013). The far reach of multiword expressions in educational technology. In Proceedings of the 9th workshop on multiword expressions (p.138). Atlanta, Georgia, USA.

Cacciari, C., & Tabossi, P. 1988. The comprehension of idioms. Journal of Memory and Language, 27(6), 668–683

Cap, F., Nirmal, M., Weller, M. & Schulte im Walde, S. (2015), How to Account for Idiomatic German Support Verb Constructions in Statistical Machine Translation. In Proceedings of the 11th Workshop on Multiword Expressions (MWE) at NAACL (pp.19–28). Denver, Colorado, USA.

Cap, F. (2014). Morphological processing of compounds for statistical machine translation (Unpublished doctoral dissertation). University of Stuttgart.

Carpuat, M., & Diab, M. (2010). Task-based evaluation of multiword expressions: A pilot study in statistical machine translation. Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp.242–245). Los Angeles, California, USA.

Chafe, W. 1968. Idiomaticity as an anomaly in the Chomskyan paradigm. Foundations of Language 4. 109–127.

Carter, R. 1998. Vocabulary: Applied Linguistics Perspectives (2nd ed.) London and New York: Routledge.

Chiang, D. (2005). A hierarchical phrase-based model for statistical machine translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (pp.263–270). Ann Arbor, Michigan, USA

Cho, K. Van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the properties of neural machine translation: Encoder-decoder approaches. In Proceedings of Conference on Empirical Methods on Natural Language Processing (EMNLP 2014) (pp.1724–1734). Doha, Qatar.

Cho, K. (forthcoming) ‘Deep Learning’. In Mitkov, R. (Ed.) The Oxford Handbook of Computational Linguistics, 2nd ed. Oxford: Oxford University Press.

Chomsky, N. (1980). Rules and representations. Behavioral and brain sciences, 3(1), 1–15.

Choueka, Yaacov, S.T Klein & E. Neuwitz. 1983. “Automatic Retrieval of Frequent Idiomatic and Collocational Expressions in a Large Corpus”. Journal of the Association for Literary and Linguistic Computing 4 (1). 34–38.

Claveau, V. (2009). Translation of biomedical terms by inferring rewriting rules. In Prince, V. (Ed.). Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration, IGI-Global (pp.106–123).

Colson, J. P. (forthcoming). “Computational phraseology and translation studies: from theoretical hypotheses to practical tools.” In Corpas Pastor, G. Colson, J. P. & Heid, U. (Eds.). (forthcoming). Computational Phraseology. Amsterdam & New York: John Benjamins

(2016) “Set phrases around globalization : an experiment in corpus-based computational phraseology. In Input a Word, Analyze the World. Selected Approaches to Corpus Linguistics, Ed. by F.A Almeida, I. Ortega Barrera, E. Quintana Toledo, and M.E. Sánchez Cuervo, 141–152. Newcastle: Cambridge Scholars Publishing.

Constant, M., & Sigogne, A. (2011). MWU-aware part-of-speech tagging with a CRF model and lexical resources. In Proceedings of the workshop on multiword expressions: From parsing and generation to the real world (pp.49–56). Portland, Oregon, USA.

Constant, M. Candito, M. & Seddah, D. (2013b) The LIGM-Alpage Architecture for the SPMRL 2013 Shared Task: Multiword Expression Analysis and Dependency Parsing. Shared task track of the EMNLP Workshop on Statistical Parsing of Morphologically Rich Languages (SPMRL’13) (pp.46–52). Seattle, Washington, USA.

Constant, M., Eryiğit, G., Monti, J., Van Der Plas, L., Ramisch, C., Rosner, M., & Todirascu, A. (2017). Multiword expression processing: a survey. Computational Linguistics, 43(4), 837–892.

Constant, M., Roux, J. L., & Sigogne, A. (2013a). Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields. ACM Transactions on Speech and Language Processing (TSLP), 10 (3), 8:1–8:24.

Cook, P., & Hirst, G. (2013). Automatically assessing whether a text is clichéd, with applications to literary analysis. In Proceedings of the 9th workshop on multiword expressions (pp.52–57). Atlanta, Georgia, USA.

Corpas Pastor, G. (2016). Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives (Full papers). Geneva: Tradulex. [[URL]].

Corpas Pastor, G., Colson, J. P. & Heid, U. (Eds.). (forthcoming). Computational Phraseology. Amsterdam & New York: John Benjamins.

Corpas Pastor, G., Monti, J., Seretan, V., & Mitkov, R. (Eds.). (2016). Workshop proceedings: Multi-word units in machine translation and translation technologies (MUMTTT 2015), Malaga, Spain. Geneva: Editions Tradulex.

Corpas Pastor, G. (ed.) (2016). Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives (Full papers). Geneva: Tradulex. [[URL]]

Costa-Jussà, M. R., & Farrús, M. (2014). Statistical machine translation enhancements through linguistic levels: A survey. ACM Computing Surveys (CSUR), 46(3), 42.

Cowie, A. P. 1981. The treatment of collocations and idioms in learners' dictionaries. Applied Linguistics 2 (3), 223–235.

Dagan, I., & Church, K. (1994). Termight: Identifying and translating technical terminology. In Proceedings of the fourth conference on Applied natural language processing (pp.34–40). Stuttgart, Germany.

Daille, B. (1994). Approche mixte pour l’extraction automatique de terminologie : statistiques lexicales et filtres linguistiques (Unpublished doctoral dissertation). Université Paris 7.

(2001). Extraction de collocation à partir de textes. Actes de la 8ème conférence sur le Traitement Automatique des Langues Naturelles (TALN’2001). (pp.3–8). Tours, France.

Diab, M. T., & Bhutada, P. (2009). Verb noun construction MWE token supervised classification. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (pp.17–22). Suntec, Singapore.

Dowdall, J., Rinaldi, F., Ibekwe-SanJuan, F., & SanJuan, E. (2003). Complex structuring of term variants for Question Answering. In Proceedings of the ACL 2003 workshop on multiword expressions: Analysis, acquisition and treatment (pp.1–8). Sapporo, Japan.

Evert, S. (2004). The statistics of word cooccurrences: Word pairs and collocations (Unpublished doctoral dissertation). University of Stuttgart.

Fazly, A., Cook, P., & Stevenson, S. (2009). Unsupervised type and token identification of idiomatic expressions. Computational Linguistics, 35(1):61–103.

Fazly, A. (2007). Automatic acquisition of lexical knowledge about multiword predicates (Unpublished doctoral dissertation). University of Toronto.

Fellbaum, C. (1993). ‘The Determiner in English Idioms’, in C. Cacciari & P. Tabossi (eds) Idioms: Processing, Structure, and Interpretation. Hillsdale, NJ: Erlbaum, 271–295.

(2007). Idioms and collocations: Corpus-based linguistic and lexicographic studies. Bloomsbury Academic.

Fernando, C. & Flavell R. (1981) On Idiom: Critical Views and Perspectives. Exeter Linguistic Studies vol. 5. Exeter: University of Exeter.

Fernández Parra, M. A. (2011). Formulaic Expressions in Computer-Assisted Translation. A specialised translation approach (Unpublished doctoral dissertation). Swansea University.

Finlayson, M., & Kulkarni, N. (2011). Detecting multi-word expressions improves Word Sense Disambiguation. In Proceedings of the workshop on multiword expressions: From parsing and generation to the real world (pp.20–24). Portland, Oregon, UAS

Firth, J. R. (1957). Papers in Linguistics 1934–1951. London: Oxford University Press.

Fraser, B. 1970. Idioms within a transformational grammar. Foundations of Language 6. 22–42

Franz, A., Horiguchi, K., Duan, L., Ecker, D., Koontz, E., & Uchida, K. (2000). An integrated architecture for example-based machine translation. In Proceedings of the 18th conference on Computational linguistics, Volume 2 (pp.1031–1035). Saarbrücken, Germany

Gangadharaia, R., & Balakrishanan, N. (2006). Application of linguistic rules to generalized example based Machine Translation for Indian languages. In Proceedings of first National symposium on modeling and shallow parsing of Indian languages (MSPIL). Mumbay, India

Geoffrey Leech, R. G., & Bryant, M. (1994). CLAWS4: The tagging of the British National Corpus. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94) (pp. 622–628). Kyoto, Japan.

(2011). CLAWS4: The tagging of the British National Corpus. In Proceedings of the 15th International Conference on Computational Linguistics (COLING-94) (pp.622–628). Kyoto, Japan.

Gibbs, R. and N. Nayak (1989) “Psycholinguistic Studies on the Syntactic Behavior of Idioms,” Cognitive Psychology 21, 100–138

Girju, R., Moldovan, D., Tatu, M., & Antohe, D. (2005). On the semantics of noun compounds. Journal of Computer Speech and Language - Special Issue on Multiword Expressions, 19 (4), 479–496.

Granger, S., & Meunier, F. (2008). Disentangling the phraseological web. In Granger, S., & Meunier, F. (Eds.), Phraseology. An interdisciplinary perspective. Amsterdam: John Benjamins publishers.

Grégoire, N., Evert, S., & Krenn, B. (Eds.). (2008). Proceedings of the LREC workshop towards a shared task for multiword expressions (MWE 2008). Marrakech, Morocco.

Groves, D., Hearne, M., & Way, A. (2004). Robust sub-sentential alignment of phrase-structure trees. In Proceedings of the 20th international conference on Computational Linguistics, (pp.1072–1078). Geneva, Switzerland.

Hazelbeck, G., & Saito, H. (2010). A hybrid approach for functional expression identification in a Japanese reading assistant. In Proceedings of the 2010 workshop on multiword expressions: From theory to applications (pp.81–84). Beijing, China.

Huet, S., & Langlais, Ph. (2011). Identifying the translations of idiomatic expressions using TransSearch. In Proceedings of the 8th International NLPCS Workshop (Human-Machine Interaction in Translation (pp.45–56). Copenhagen, Denmark.

(2012). Translation of idiomatic expressions across different languages: A study of the effectiveness of TransSearch. In Neustein, A. & Markowitz, J. A. (Eds.) Where Humans Meet Machines. Innovative Solutions for Knotty Natural-Language Problems (pp.185–209). New York: Springer.

Hurskainen, A. (2008). Multiword expressions and machine translation. Technical Reports in Language Technology, Report No 1.

Jackendoff, R. (1997). The Architecture of the Language Faculty, Cambridge, Mass., MIT Press.

Jian, J. Y., Chang, Y. C., & Chang, J. S. (2004). Collocational translation memory extraction based on statistical and linguistic information. ROCLING 2004, Conference on Computational Linguistics and Speech Processing (pp.329–346). Taipei, Taiwan.

Kalchbrenner, N., & Blunsom, P. (2013). Recurrent convolutional neural networks for discourse compositionality. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, (pp.119–126). Sofia, Bulgaria. arXiv preprint arXiv:1306.3584.

Katz, J., & Postal, P. (1963).The semantic interpretation of idioms and sentences containing them. MIT Research Laboratory of Electronic Quarterly Progress Report, 70, 275–282.

Katz, G., & Giesbrecht, E. (2006, July). Automatic identification of non-compositional multi-word expressions using latent semantic analysis. In Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties (pp. 12–19). Association for Computational Linguistics.

Kilgarriff, Adam, Jakubíček, Miloš, Kovář, Voytěch, Rychlý, P., & Suchomel, V. (2014). Finding terms in corpora for many languages with the Sketch Engine. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (pp.53–56). Gothenburg, Sweden.

Klebanov, B. B., Burstein, J., & Madnani, N. (2013). Sentiment Profiles of multiword expressions in test-taker essays: The case of noun-noun compounds. ACM Transactions for Speech and Language Processing, Special Issue on Multiword Expressions: From Theory to Practice, 10 (3), 12:1–12:15.

Koehn, P., Och, F. J., Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 1, (NAACL '03). (pp.48–54). Edmonton, Canada.

Korkontzelos, I., & Manandhar, S. (2010). Can recognising multiword expressions improve shallow parsing? In Human language technologies: The 2010 annual conference of the North American chapter of the Association for Computational Linguistics (pp.636–644). Los Angeles, California, USA.

Kovář, V., Baisa, V., & Jakubíček, M. (2016). Sketch Engine for bilingual lexicography. International Journal of Lexicography, 29 (3), 339–352.

Krenn, B. (2000). The usual suspects: Data-oriented models for identification and representation of lexical collocations (Vol. 7). Saarbrücken, Germany: German Research Center for Artificial Intelligence and Saarland University Dissertations in Computational Linguistics and Language Technology.

Lambert, P., & Banchs, R. (2006). Grouping multi-word expressions according to part-of-speech in statistical machine translation. In Proceedings of the EACL Workshop on Multi-word expressions in a multilingual context, (pp.9–16). Trento, Italy.

(2005). Data inferred multi-word expressions for statistical machine translation. In Proceedings of Machine Translation Summit X (pp.396–403). Phuket, Thailand.

Lau, J. H., Baldwin, T., & Newman, D. (2013). On collocations and topic models. ACM Transactions on Speech and Language Processing, 10 (3), 10:1–10:14.

Lewis, D. D., & Croft, W. B. (1990). Term clustering of syntactic phrases. In Proceedings of 13th international ACM-SIGIR conference on research and development in information retrieval (SIGIR’90) (pp.385–404). Brussels, Belgium.

Lin, D. (1998). Using collocation statistics in information extraction. In Proceedings of the seventh message understanding conference (MUC-7). Fairfax, Virginia, USA.

Luong, M. T., Sutskever, I., Le, Q. V., Vinyals, O., & Zaremba, W. (2014). Addressing the rare word problem in neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, (pp. 11–19). Beijing, China. arXiv preprint arXiv: 1410.8206.

Macken, L. (2009). In search of the recurrent units of translation. In Daelemans, W., & Hoste, V. (Eds.), Evaluation of Translation Technology (pp.195–212). Brussels: Academic and Scientific Publishers.

Makkai, A. 1972. Idiom structure in English (Janua Linguarum, series maior, 48). The Hague: Mouton.

Mandala, R., Tokunaga, T., & Tanaka, H. (2000). Query expansion using heterogeneous thesauri. Information Processing and Management, 36 (3), 361–378.

Manrique-Losada, B., Zapata-Jaramillo, C. M., & Burgos, D. A. (2013). Exploring MWEs for knowledge acquisition from corporate technical documents. In Proceedings of the 9th workshop on multiword expressions (pp.82–86). Atlanta, Georgia, USA.

Marcu, D., Wang, W., Echihabi, A., & Knight, K. (2006). SPMT: Statistical machine translation with syntactified target language phrases. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (pp.44–52). Sydney, Australia

Marvel, A., & Koenig, J.-P. (2015). Event categorization beyond verb senses. In Proceedings of the 11th workshop on multiword expressions (pp.77–86). Denver, Colorado, USA.

Melamed, I. D. (1997, July). A word-to-word model of translational equivalence. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics (pp. 490–497). Association for Computational Linguistics.

Mitkov, R. (Ed.). (forthcoming). The Oxford handbook of computational linguistics. Oxford University Press.

(2016). “Computational Phraseology light: automatic translation of multiword expressions without translation resources”. Yearbook of Phraseology, 26(7), 149–166.

Monti, J. (2013). Multi-word unit processing in Machine Translation: developing and using language resources for multi-word unit processing in Machine Translation. (Unpublished doctoral dissertation). University of Salerno, Italy.

Monti, J, Arhan, M. & Sangati F. (forthcoming). Translation asymmetries of Multiword Expressions in Machine Translation: an analysis of the TED-MWE corpus. In Corpas Pastor, G., Colson, J. P. & Heid, U. (Eds.). (forthcoming). Computational Phraseology. Amsterdam & New York: John Benjamins.

Monti, J., Elia, A., Postiglione, A., Monteleone, M., & Marano, F. (2012). In search of knowledge: text mining dedicated to technical translation. In Proceedings of ASLIB 2011 - Translating and the Computer Conference. London, United Kingdom.

Monti, J., Mitkov, R., Seretan V. & Corpas Pastor, G. (Eds.). (2018) Workshop proceedings Multi-word units in Machine Translation and Translation Technology (MUMTTT2017). London, United Kingdom. Geneva: Editions Tradulex.

Monti, J., Mitkov, R., Corpas Pastor, G., & Seretan, V. (Eds.). (2013). Workshop proceedings: Multi-word units in machine translation and translation technologies. Nice, France: The European Association for Machine Translation.

Moon, R. (1998). Fixed expressions and idioms in English: A corpus-based approach. Oxford: Claredon Press Oxford.

(1988). Fixed expressions and idioms in English: A corpus-based approach. (Oxford studies in lexicography and lexicology.) Oxford: Clarendon Press.

Moreno-Ortiz, A., Perez-Hernandez, C., & Del-Olmo, M. (2013). Managing multiword expressions in a lexicon-based sentiment analysis system for Spanish. In Proceedings of the 9th workshop on multiword expressions (pp.1–10). Atlanta, Georgia, USA.

Nivre, J., & Nilsson, J. (2004). Multiword units in syntactic parsing. In MEMURA 2004 – Workshop on Multi-word-expressions in a Multilingual Context held in conjunction with the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006) (pp.39–46). Trento, Italy.

Nagy, I. (2014). Detecting Multiword Expressions and Named Entities in Natural Language Texts, Doctoral dissertation, Ph. D. dissertation, University of Szeged.

Nokel, M., & Loukachevitch, N. (2015). A method of accounting bigrams in topic models. In Proceedings of the 11th workshop on multiword expressions (pp.1–9). Denver, Colorado, USA.

Nomiyama, H. (1992). Machine translation by case generalization. In Proceedings of the 14th conference on Computational linguistics–Volume 2, (pp.714–720). Nantes, France.

Nunberg, G., Sag, I.A., Wasow, T. 1994. Idioms. Language 70 (3). 491–538.

Och, F. J., & Marcu, D. (2003). Statistical phrase-based translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1 (pp.48–54). Edmonton, Canada.

Okita, T., Guerra, M. A., Graham, Y., & Way, A. (2010). Multi-word expression-sensitive word alignment. In Proceedings of the 4th International Workshop on Cross Lingual Information Access at COLING 2010 (pp.26–34). Beijing, China.

Okuma, H., Yamamoto, H., & Sumita, E. (2008). Introducing a translation dictionary into phrase-based SMT. IEICE transactions on information and systems, 91(7), 2051–2057.

Orlandi, A. & Giacomini, L. (eds.) 2016. Defining collocations for lexicographic purposes: from linguistic theory to lexicographic practice (Series ‘Linguistic Insights’). Frankfurt: Peter Lang.

Ozdowska, S. (2006). ALIBI, un systeme d’ALIgnement BIlingue base de regles (Doctoral dissertation PhD thesis), Université de Toulouse 2.

Pal, S., Chakraborty, T., & Bandyopadhyay, S. (2011). Handling multiword expressions in phrase-based statistical machine translation. Machine Translation Summit XIII, (pp.215–224). Xiamen, China.

Pal, S., Kumar Naskar, S., Pecina, P., Bandyopadhyay, S., & Way, A. (2010). Handling named entities and compound verbs in phrase-based statistical machine translation. In Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications (pp.46–54). Beijing, China

Pawley, A. & Syder, F. H. (1983). Two puzzles for linguistic theory: Native like selection and native like fluency. In J. J. Richards, & R. R. W. Schmidt (eds.), Language and Communication (pp.191–225). Harlow: Longman.

Pearce, D. (2002). A Comparative Evaluation of Collocation Extraction Techniques. In Proceedings of Ninth International Conference on Language Resources and Evaluation (LREC2002) (pp. 1530–1536). Las Palmas, Spain.

Pecina, P. (2008). Lexical association measures: Collocation extraction (Unpublished doctoral dissertation). Charles University.

Ramisch, C. (2012). A generic and open framework for multiword expressions treatment: from acquisition to applications (Unpublished doctoral dissertation). University of Grenoble and Federal University of Rio Grande do Sul.

(2015). Multiword expressions acquisition: A generic and open framework (Vol. XIV). Springer.

Ramisch, C., Villavicencio, A. (forthcoming) Computational treatment of multiword expressions. In Mitkov, R. (Ed.). (forthcoming). The Oxford handbook of computational linguistics. Oxford University Press.

Ramisch, C., Villavicencio, A., & Kordoni, V. (2013). Introduction to the special issue on multiword expressions: From theory to practice and use. ACM Transactions on Speech and Language Processing, 10 (2), 3:1–3:10. (Special issue on Multiword Expressions.

Rapp, R., Sharoff, S. (2014). Extracting multiword translations from aligned comparable documents. Proceedings of the 3rd Workshop on Hybrid Approaches to Translation (HyTra) (pp.87–95). Gothenburg, Sweden

Rayson, P., Piao, S., Sharoff, S., Evert, S., & Moirón, B. V. (2010). Multiword expressions: hard going or plain sailing? Language Resources and Evaluation Special Issue on Multiword expressions: Hard going or plain sailing, 44 (1–2), 1–25. (Special issue on Multiword Expressions)

Ren, Z., Lü, Y., Cao, J., Liu, Q., & Huang, Y. (2009). Improving statistical machine translation using domain bilingual multiword expressions. In Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications (pp.47–54). Suntec, Singapore.

Rikters M., & Bojar O. (2017). Paying Attention to Multi-Word Expressions in Neural Machine Translation. In MT Summit XVI Proceedings Nagoya, Japan, September 18–22, 2017, vol. 1: Research Track, (pp. 86–95). Nagoya, Japan.

Riloff, E. (2005). Little words can make a big difference for text classification. In Proceedings of the 18th annual international ACM SIGIR conference on research and development in information retrieval (pp.130–136). Seattle, Washington, USA.

Rohanian, O., Taslimipoor, S., Yaneva, V. and L. A. Ha (2017). Using Gaze Data to Predict Multiword Expressions. In Proceedings of the 11th Conference on Advances in Natural Language Processing (RANLP 2017), Varna, Bulgaria.

Sag, I. A., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of the third international conference on intelligent text processing and computational linguistics (CICLING 2002) (pp.1–15). Mexico City, Mexico.

Salehi, B. Mathur, N., Cook, P. & Baldwin, T. (2015). The impact of multiword expression compositionality on machine translation evaluation. In Proceedings of the 11th Workshop on MWEs (MWE 2015) (pp.54–59). Denver, Colorado, USA.

Salton, G., & Smith, M. (1989). On the application of syntactic methodologies in automatic text analysis. In Proceedings of the 12th annual international ACM SIGIR conference on research and development in information retrieval (pp.137–150). New York, USA.

Sanjuan, E., Dowdall, J., Ibekwe-Sanjuan, F., & Rinaldi, F. (2005). A symbolic approach to automatic multiword term structuring. Journal of Computer Speech and Language – Special Issue on Multiword Expressions, 19 (4), 524–542.

Savary, A., Ramisch, C., Cordeiro, S., Sangati, F., Vincze, V., Qasemi Zadeh, B., Candito, M., Cap, F., Giouli, V., Stoyanova, I & Doucet, A. (2017). The PARSEME shared task on automatic identification of verbal multiword expressions. In Proceedings of the 13th workshop on multiword expressions (MWE 2017) (pp.31–47). Valencia, Spain.

Schneider, N. (2014). Lexical Semantic Analysis in Natural Language Text. Doctoral dissertation, Ph. D. dissertation, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA. Carnegie Mellon University.

Schneider, N., Onuffer, S., Kazour, N., Danchik, E., Mordowanec, M. T., Conrad, H., and Smith, N. A. (2014). Comprehensive annotation of multiword expressions in a social web corpus. In Proceedings of the International Conference on Language Resources and Evaluation (LREC’14) (pp.455–461). Reykjavik, Island.

Schneider, N., Hovy, D., Johannsen, A., & Carpuat, M. (2016). Semeval-2016 task 10: Detecting minimal semantic units and their meanings (dimsum). In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) (pp. 546–559).

Scott, B. (2003). The Logos model: An historical perspective. Machine Translation, 18 (1), 1–72.

Scott, B., & Barreiro, A. (2009). OpenLogos MT and the SAL representation language. In Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation (pp.19–26). Alacant, Spain.

Segura, J., & Prince, V. (2011). Using Alignment to detect associated multiword expressions in bilingual corpora. Tralogy. Paris, France.

Seretan, V. (2008). Collocation extraction based on syntactic parsing (Unpublished doctoral dissertation). University of Geneva.

(2009). Extraction de collocations et leurs équivalents de traduction à partir de corpus parallèles. TAL, 50(1), 305–332.

(2011). A collocation-driven approach to text summarization. In Actes de la 18e conférence sur le traitement automatique des langues naturelles (TALN 2011) (pp.9–14). Montpellier, France.

(2011). Syntax-based collocation extraction (Vol. 44). Dordrecht: Springer.

Seretan, V., & Wehrli, E. (2007). Collocation translation based on sentence alignment and parsing. In Proceedings of Traitement Automatique des Langues Naturelles (TALN) (pp.401–410). Toulouse, France.

Shigeto, Y., Azuma, A., Hisamoto, S., Kondo, S., Kouse, T., Sakaguchi, K., Yoshimoto, A., Yung, F. & Matsumoto, Y. (2013). Construction of English MWE dictionary and its application to POS tagging. In Proceedings of the 9th workshop on multiword expressions (pp.139–144). Atlanta, Georgia, USA.

Sinclair, J. (1991). Corpus, concordance, collocation. Oxford: Oxford University Press.

Sinclair, J. McH. (1996). The search for units of meaning. Textus, 9(1), 75–106.

Sinclair, J. M. (2007). Collocation reviewed. (manuscript), Tuscan Word Centre, Italy.

Sinclair, J. (2008). Preface. In Granger, S., & Meunier, F. (Eds.), Phraseology. An interdisciplinary perspective. Amsterdam: John Benjamins publishers.

Smadja, F. (1993). Retrieving collocations from text: Xtract. Computational linguistics, 19(1), 143–177.

Straňák, P. (2010). Annotation of multiword expressions in the Prague Dependency Treebank (Unpublished doctoral dissertation). Charles University.

Sumita, E., & Iida, H. (1991). Experiments and prospects of example-based machine translation. In Proceedings of the 29th annual meeting on Association for Computational Linguistics (pp.185–192). Berkeley, California

Sumita, E., Iida, H., & Kohyama, H. (1990). Translating with examples: a new approach to machine translation. The Third International Conference on Theoretical and Methodological Issues in Machine Translation of Natural Language (pp.203–212) Austin, Texas, USA.

Tambouratzis, G., Troullinos, M., Sofianopoulos, S., & Vassiliou, M. (2012). Accurate phrase alignment in a bilingual corpus for EBMT systems. In Proceedings of the 5th BUCC Workshop, held within the International Conference on Language Resources and Evaluation (LREC2012) , Vol. 26, (pp.104–111). Istanbul, Turkey.

Tang, Y., Meng, F., Lu, Z., Li, H., & Yu, P. L. (2016). Neural machine translation with external phrase memory. arXiv preprint arXiv:1606.01792.

Taslimipoor, S., Rohanian, O., Mitkov, R. & A. Fazly. (2017). Investigating the opacity of verb-noun multiword expression usages in context. In Proceedings of the 13th Workshop on Multiword Expressions, MWE@EACL 2017, Valencia, Spain, April 4, 133–138.

Taslimipoor, S., Mitkov, R., Mitkov, R. & A. Fazly. (2016). “Bilingual Contexts from Comparable Corpora to Mine for Translations of Collocations”. In Proceedings of the 17thInternational Conference on Intelligent Text Processing and Computational Linguistics (CICLing2016), Konya, Turkey.

Taslimipoor, S. (2015). “Cross-lingual Extraction of Multiword Expressions”. In Corpas Pastor, G.(ed.) (2016). Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives (Full papers), Geneva: Tradulex. [[URL]]

Thurmair, G. (2004). Multilingual Content Processing. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LRE2004), (pp.XI–XVI). Lisbon, Portugal.

Tillmann, C., & Xia, F. (2003). A phrase-based unigram model for statistical machine translation. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003–short papers, (pp.106–108). Edmonton, Canada

Tomokiyo, T., & Hurst, M. (2003). A language model approach to keyphrase extraction. In Proceedings of the ACL 2003 workshop on multiword expressions: Analysis, acquisition and treatment (pp.33–40). Sapporo, Japan

Tsvetkov, Y. (2010). Extraction of multi-word expressions from small parallel corpora (Unpublished doctoral dissertation). University of Haifa.

Ullman, E., & Nivre, J. (2014). Paraphrasing Swedish compound nouns in Machine Translation. In Proceedings of the 10th workshop on multiword expressions (MWE) (pp.99–103). Gothenburg, Sweden.

Váradi, T. (2006). Multiword Units in an MT Lexicon. In Proceedings of the EACL Workshop on Multi-Word Expressions in a Multilingual Contexts, (pp.73–78). Trento, Italy.

Venkatapathy, S., & Joshi, A. K. (2006). Using information about multi-word expressions for the word-alignment task. In Proceedings of the workshop on multiword expressions: Identifying and exploiting underlying properties (pp.20–27). Sydney, Australia.

Venkatsubramanyan, S., & Perez-Carballo, J. (2004). Multiword expression filtering for building knowledge. In T. Tanaka, A. Villavicencio, F. Bond, & A. Korhonen (Eds.), Second ACL workshop on multiword expressions: Integrating processing. (pp.40–47) Barcelona, Spain.

Villavicencio, A., Bond, F., Korhonen, A., & McCarthy, D. (2005). Introduction to the special issue on multiword expressions: Having a crack at a hard nut. Computer Speech & Language, 19 (4), 365–377. (Special issue on Multiword Expressions.

Villavicencio, A., Kordoni, V., Zhang, Y., Idiart, M., & Ramisch, C. (2007). Validation and evaluation of automatically acquired multiword expressions for grammar engineering. In Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning (EMNLP-CONLL) (pp.1034–1043). Prague, Czech Republic.

Vintar, S., & Fiser, D. (2008). Harvesting Multi-Word Expressions from Parallel Corpora. Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08) (pp.1091–1096). Marrakech, Morocco.

Wacholder, N., & Song, P. (2005). Toward a task-based gold standard for evaluation of NP chunks and technical terms. In Proceedings of the 2003 Human Language Technology conference of the North American Chapter of the Association for Computational Linguistics (pp.130–136). Edmonton, Canada

Wang, L., & Yu, S. (2010). Construction of Chinese idiom knowledge-base and its applications. In Proceedings of the 2010 workshop on multiword expressions: From theory to applications (pp.11–18). Beijing, China.

Wehrli, E. (2014). The relevance of collocations for parsing. In Proceedings of the 10th workshop on multiword expressions (MWE 2014) (pp.26–32). Gothenburg, Sweden.

Wehrli, E., Seretan, V., & Nerima, L. (2010). Sentence analysis and collocation identification. In Proceedings of the workshop on multiword expressions: from theory to applications (MWE 2010) (pp.27–35). Beijing, China.

Widdows, D., & Dorow, B. (2005, June). Automatic extraction of idioms using graph analysis and asymmetric lexicosyntactic patterns. In Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition (pp. 48–56). Association for Computational Linguistics.

Williams, L., Bannister, C., Arribas-Ayllon, M., Preece, A., & Spasić, I. (2015). The role of idioms in sentiment analysis. Expert Syst. Appl., 42 (21), 7375–7385.

Wu, C. C., & Chang, J. S. (2004). Bilingual Collocation Extraction Based on Syntactic and Statistical Analyses. Computational Linguistics and Chinese Language Processing, 9(1):1–20.

Wu, H., Wang, H., & Zong, C. (2008). Domain adaptation for statistical machine translation with domain dictionary and monolingual corpora. In Proceedings of the 22nd International Conference on Computational Linguistics–Volume 1 (pp.993–1000), Manchester, United Kingdom.

Yaneva, V., Taslimipoor, S., Rohanian, O. & L. A. Ha. (2017). Cognitive Processing of Multiword Expressions in Native and Non-native Speakers of English: Evidence from Gaze Data”. In Mitkov, R. (Ed.) Computational and Corpus-based Phraseology. Springer: Heidelberg, New York, London.

Yarowsky, D. (1993). One sense per collocation. In Proceedings of ARPA Human Language Technology workshop (pp.266–271). Princeton, New Jersey, USA.

(1995). Unsupervised word sense disambiguation rivalling supervised methods. In Proceedings of the 33rd annual meeting of the Association for Computational Linguistics (ACL 1995) (pp.189–196). Cambridge, Massachusetts, USA.

Zens, R., Och, F. J., & Ney, H. (2002). Phrase-based statistical machine translation. Annual Conference on Artificial Intelligence (pp.18–32). Edmonton, Canada.

Zhang, Y., & Kordoni, V. (2006). Automated deep lexical acquisition for robust open texts processing. In Proceedings of 5th International Conference on Language Resources and Evaluation (LRE2006)–2006 (pp.275–280). Genoa, Italy.

Zollmann, A., & Venugopal, A. (2006). Syntax augmented machine translation via chart parsing. In Proceedings of the Workshop on Statistical Machine Translation (pp.138–141). New York city, USA.

Cited by (8)

Cited by eight other publications

Order by:

Corpas Pastor, Gloria & Enrique Gutiérrez Rubio

2023. Computational and corpus phraseology applied to Spanish. Romanica Olomucensia 35:1 ► pp. 1 ff.

Hidalgo-Ternero, Carlos Manuel & Xiaoqing Zhou-Lian

2022. Reassessing gApp: Does MWE Discontinuity Always Pose a Challenge to Neural Machine Translation?. In Computational and Corpus-Based Phraseology [Lecture Notes in Computer Science, 13528], ► pp. 116 ff.

Giczela-Pastwa, Justyna

2021. Developing phraseological competence in L2 legal translator trainees: a proposal of a data mining technique applied in translation from an LLD into ELF. The Interpreter and Translator Trainer 15:2 ► pp. 187 ff.

Stasimioti, Maria, Vilelmini Sosoni & Konstantinos Chatzitheodorou

2021. Investigating post-editing effort. Cognitive Linguistic Studies 8:2 ► pp. 378 ff.

Corpas Pastor, Gloria & Jean-Pierre Colson

2020. Introduction. In Computational Phraseology [IVITRA Research in Linguistics and Literature, 24], ► pp. 2 ff.

Hidalgo-Ternero, Carlos Manuel & Gloria Corpas Pastor

2020. Bridging the “gApp”: improving neural machine translation systems for multiword expression detection . Yearbook of Phraseology 11:1 ► pp. 61 ff.

Seretan, Violeta

2018. Bridging Collocational and Syntactic Analysis. In Lexical Collocation Analysis [Quantitative Methods in the Humanities and Social Sciences, ], ► pp. 23 ff.

Ramisch, Carlos

2017. Putting the Horses Before the Cart: Identifying Multiword Expressions Before Translation. In Computational and Corpus-Based Phraseology [Lecture Notes in Computer Science, 10596], ► pp. 69 ff.

This list is based on CrossRef data as of 27 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.