Article published In:
TargetVol. 34:2 (2022) ► pp.278–308
Can a corpus-driven lexical analysis of human and machine translation unveil discourse features that set them apart?
There is still much to learn about the ways in which human and machine translation differ with regard to the contexts that regulate the production and interpretation of discourse. The present study explores whether a corpus-driven lexical analysis of human and machine translation can unveil discourse features that set the two apart. A balanced corpus of source texts aligned with authentic, professional translations and neural machine translations was compiled for the study. Lexical discrepancies in the two translation corpora were then extracted via a corpus-driven keyword analysis, and examined qualitatively through parallel concordances of source texts aligned with human and machine translation. The study shows that keyword analysis not only reiterates known problems of discourse in machine translation such as lexical inconsistency and pronoun resolution, but can also provide valuable insights regarding contextual aspects of translated discourse deserving further research.
Article outline
- 1.Introduction
- 2.Background
- 3.Method
- 3.1Materials
- 3.2Procedure
- 4.Results
- 4.1Grammatical keywords
- 4.1.1Modals
- 4.1.2Prepositions
- 4.1.3Pronouns
- 4.2Lexical keywords
- 4.2.1Spelling
- 4.2.2Proper names
- 4.2.3Foreign words
- 5.Discussion and conclusion
- Acknowledgements
- Notes
-
References
References (41)
References
Bawden, Rachel. 2016. “Cross-lingual Pronoun Prediction with Linguistically Informed Features.” In Proceedings of the First Conference on Machine Translation, Berlin, Germany, 11–12 August, 564–570. Stroudsburg: Association for Computational Linguistics.
Blum-Kulka, Shoshana. 1986. “Shifts of Cohesion and Coherence in Translation.” In Interlingual and Intercultural Communication: Discourse and Cognition in Translation and Second Language Acquisition Studies, edited by Juliane House and Shoshana Blum-Kulka, 17–35. Tübingen: Gunter Narr.
Carpuat, Marine, and Michel Simard. 2012. “The Trouble with SMT Consistency.” In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montréal, Canada, 7–8 June, edited by Chris Callison-Burch, Philipp Koehn, Christof Monz, Matt Post, Radu Soricut, and Lucia Specia, 442–449. Stroudsburg: Association for Computational Linguistics.
Catford, John C. 1965. A Linguistic Theory of Translation: An Essay in Applied Linguistics. Oxford: Oxford University Press.
compara. 2010. (Version 13.1.17.) Accessed April 12, 2019. [URL]
De Beaugrande, Robert, and Wolfgang Dressler. 1981. Introduction to Text Linguistics. London: Longman.
Dougal, Duane K., and Deryle Lonsdale. 2020. “Improving NMT Quality Using Terminology Injection.” In Proceedings of the Twelfth International Conference on Language Resources and Evaluation, Marseille, France, 11–16 May, edited by Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 4820–4827. Paris: European Language Resources Association. [URL]
Frankenberg-Garcia, Ana. 2008. “‘Suggesting Rather Special Facts’: A Corpus-Based Study of Distinctive Lexical Distributions in Translated Texts.” Corpora (3) 21: 195–211.
Frankenberg-Garcia, Ana. 2016. “A Corpus Study of Loans in Translated and Non-Translated Texts.” In Corpus-Based Approaches to Translation and Interpreting: From Theory to Applications, edited by Gloria Corpas Pastor and Miriam Seghiri, 19–42. Frankfurt: Peter Lang.
Frankenberg-Garcia, Ana, and Diana Santos. 2003. “Introducing compara: The Portuguese–English Parallel Corpus.” In Corpora in Translator Education, edited by Federico Zanettin, Silvia Bernardini, and Dominic Stewart, 71–87. Manchester: St. Jerome.
Google Translator Toolkit (2019). Accessed December 1, 2019. [URL]
Guillou, Liane. 2013. “Analysing Lexical Consistency in Translation.” In Proceedings of the Workshop on Discourse in Machine Translation, Soa, Bulgaria, 9 August, edited by Bonnie Webber, Andrei Popescu-Belis, Katja Markert, and Jörg Tiedemann, 10–18. Stroudsburg: Association for Computational Linguistics. [URL]
Guillou, Liane. 2016. Incorporating Pronoun Function into Statistical Machine Translation. PhD diss. University of Edinburgh.
Guillou, Liane, Christian Hardmeier, Ekaterina Lapshinova-Koltunski, and Sharid Loáiciga. 2018. “A Pronoun Test Suite Evaluation of the English–German MT Systems at WMT 2018.” In Proceedings of the Third Conference on Machine Translation: Shared Task Papers, Brussels, Belgium, 31 October – 1 November, edited by Ondřej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aurélie Névéol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, and Karin Verspoor, 570–577. Stroudsburg: Association for Computational Linguistics.
Halliday, M. A. K. 1978. Language as a Social Semiotic: The Social Interpretation of Language and Meaning. London: Edward Arnold.
Hardmeier, Christian. 2014. Discourse in Statistical Machine Translation. PhD diss. Uppsala University.
House, Juliane. 2006. “Text and Context in Translation.” Journal of Pragmatics 38 (3): 338–358.
Kilgarriff, Adam. 2009. “Simple Maths for Keywords.” In Proceedings of Corpus Linguistics Conference, Liverpool, UK. [URL]
Kilgarriff, Adam, Vit Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vit Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography 11: 7–36.
Klaudy, Kinga. 2009. “The Asymmetry Hypothesis in Translation Research.” In Translators and Their Readers: In Homage to Eugene A. Nida, edited by Rodica Dimitriu and Miriam Shlesinger, 283–303. Brussels: Les Editions du Hazard.
Klaudy, Kinga. 2017. “Linguistic and Cultural Asymmetry in Translation from and into Minor Languages.” Cadernos de Literatura em Tradução, 171, 22–37.
Koehn, Philipp. 2005. “Europarl: A Parallel Corpus for Statistical Machine Translation.” In Proceedings of the Tenth Machine Translation Summit, Phuket, Thailand, 12–16 September, 79–86. Tokyo: Asia-Pacific Association for Machine Translation. [URL]
Koehn, Philipp, and Josh Schroeder. 2007. “Experiments in Domain Adaptation for Statistical Machine Translation.” In Proceedings of the Second Workshop on Statistical Machine Translation, Prague, Czech Republic, 23 June, 224–227. Stroudsburg: Association for Computational Linguistics.
Lapshinova-Koltunski, Ekaterina, and Christian Hardmeier. 2017. “Discovery of Discourse-Related Language Contrasts through Alignment Discrepancies in English–German Translation.” In Proceedings of the Third Workshop on Discourse and Machine Translation, Copenhagen, Denmark, 8 September, edited by Bonnie Webber, Andrei Popescu-Belis, and Jörg Tiedemann, 73–81.
Läubli, Samuel, Rico Sennrich, and Martin Volk. 2018. “Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation.” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October – 4 November, edited by Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, 4791–4796. Stroudsburg: Association for Computational Linguistics.
Luong, Ngoc-Quang, and Andrei Popescu-Belis. 2016. “A Contextual Language Model to Improve Machine Translation of Pronouns by Re-ranking Translation Hypotheses.” In Proceedings of the 19th Annual Conference of the European Association for Machine Translation, Riga, Latvia, special issue of Baltic Journal of Modern Computing 4 (2): 292–304.
Luong, Ngoc-Quang, Andrei Popescu-Belis, Annette Rios Gonzales, and Don Tuggener. 2017. “Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities.” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Vol 2, Short Papers, Valencia, Spain, 3–7 April, edited by Mirella Lapata, Phil Blunsom, and Alexander Koller, 631–636. Stroudsburg: Association for Computational Linguistics.
Morante, Roser, and Caroline Sporleder. 2012. “Modality and Negation: An Introduction to the Special Issue.” Computational Linguistics, 38 (2): 223–260.
Nakov, Preslav. 2016. “Negation and Modality in Machine Translation.” In Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics, Osaka, Japan, 12 December, edited by Eduardo Blanco, Roser Morante, and Roser Saurí, 411. Stroudsburg: Association for Computational Linguistics. [URL]
Popescu-Belis, Andrei, Sharid Loáiciga, Christian Hardmeier, and Deyi Xiong, eds. 2019. Proceedings of the Fourth Workshop on Discourse in Machine Translation, Hong Kong, China, 3 November. Stroudsburg: Association for Computational Linguistics. [URL]
Pym, Anthony. 2015. “Translating as Risk Management.” Journal of Pragmatics 851: 67–80.
Schleiermacher, Friedrich. (1813) 2004. “On the Different Methods of Translating.” In The Translation Studies Reader, 2nd ed., edited by Lawrence Venuti, 43–63. London: Routledge.
Tiedemann, Jörg. 2012. “Parallel Data, Tools and Interfaces in OPUS.” In Proceedings of the 8th International Conference on Language Resources and Evaluation, Istanbul, Turkey, edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Mehmet Uğur Doğan, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, 2214–2218. Stroudsburg: Association for Computational Linguistics. [URL]
Toral, Antonio, and Andy Way. 2018. “What Level of Quality Can Neural Machine Translation Attain on Literary Text?” In Translation Quality Assessment: From Principles to Practice, vol. 11, edited by Joss Moorkens, Sheila Castilho, Federico Gaspari, and Stephen Doherty, 263–287. Cham: Springer.
Turovsky, Barak. 2016. “Found in Translation: More Accurate, Fluent Sentences in Google Translate.” Google (blog), November 15, 2016. [URL]
Van Dijk, Teun A. 1977. Text and Context: Explorations in the Semantics and Pragmatics of Discourse. Harlow: Longman.
Vinay, Jean-Paul, and Jean Darbelnet. (1958) 2004. “A Methodology for Translation.” In The Translation Studies Reader, 2nd ed., edited by Lawrence Venuti, 128–137. London: Routledge.
Webber, Bonnie, Andrei Popescu-Belis, and Jörg Tiedemann, eds. 2017. Proceedings of the Third Workshop on Discourse in Machine Translation, Copenhagen, Denmark, 8 September. [URL]
Cited by (3)
Cited by three other publications
Dallı, Harun, Olgun Dursun, Tunga Güngör, Sabri Gürses, Ena Hodzik, Mehmet Şahin & Zeynep Yirmibeşoğlu
2024.
Giving a translator’s touch to the machine: Reproducing translator style in literary machine translation.
Palimpsestes 38
Fu, Linling & Lei Liu
2024.
What are the differences? A comparative study of generative artificial intelligence translation and human translation of scientific texts.
Humanities and Social Sciences Communications 11:1
Niu, Jiang & Yue Jiang
2024.
Does simplification hold true for machine translations? A corpus-based analysis of lexical diversity in text varieties across genres.
Humanities and Social Sciences Communications 11:1
This list is based on CrossRef data as of 29 december 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.