A comparison of translation process research findings derived from different word alignment methods: Chapter 10. Impact of word alignment on word translation entropy and other metrics

Gilbert, Devin; Toledo-Báez, Cristina; Carl, Michael; Espino, Haydeé

doi:10.1075/ata.xx.10gil

Part of

Translation in Transition: Human and machine intelligence
Edited by Isabel Lacruz
[American Translators Association Scholarly Monograph Series XX] 2023
► pp. 203–235

Chapter 10
Impact of word alignment on word translation entropy and other metrics

A comparison of translation process research findings derived from different word alignment methods

Devin Gilbert | Utah Valley University

Cristina Toledo-Báez | University of Málaga

Michael Carl | Kent State University

Haydeé Espino | Kent State University

Many of the findings from studies using the Center for Research and Innovation in Translation and Translation Technology (CRITT) Translation Process Research Database (TPR-DB) framework rely on word(s)-to-word(s) alignments of the source text and target text. However, little research has been done on the impacts different alignment methods have on these findings. This study compares two different manual word alignment methods and four automatic word alignment methods on the basis of one English-Spanish TPR-DB study that has been used extensively (the BML12 dataset). We replicate past findings from the BML12 dataset using these different alignments in order to determine the impact of alignment, and we present qualitative/quantitative analyses of the different word-alignment methods.

Keywords: word alignment, automatic word alignment, word translation entropy, replication study, CRITT TPR-DB

Article outline

1.Introduction
2.Summary of past research
3.Procedure
- 3.1Manual alignment
- 3.2Automatic alignment
- 3.3Post-processing
4.Research questions 1 and 2: Qualitative analysis and replicating past studies
- 4.1Carl and Schaeffer (2017)
- 4.2Toledo-Báez and Carl (2020)
- 4.3Ogawa et al. (2021)
5.Research question 3: Comparing measures across the alignment methods
6.Conclusions
- 6.1RQ1: Is new manual alignment more consistent?
- 6.2RQ2: Will alignment change results?
- 6.3RQ3: Alignment consistency
- 6.4Final remarks
Notes
References
Appendix

Published online: 26 July 2023

https://doi.org/10.1075/ata.xx.10gil

References (17)

References

Almazroei, Samar A., Haruka Ogawa, and Devin Gilbert. 2019. “Investigating Correlations Between Human Translation and MT Output.” In Proceedings of the Second MEMENTO Workshop on Modelling Parameters of Cognitive Effort in Translation Production, 11–13. Dublin, Ireland: European Association for Machine Translation. [URL]

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1): 1–48.

Carl, Michael. 2012. “Translog-II: A Program for Recording User Activity Data for Empirical Reading and Writing Research.” In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), 4108–4112. Istanbul, Turkey: European Language Resources Association (ELRA).

. 2021. “Information and Entropy Measures of Rendered Literal Translation.” In Explorations in Empirical Translation Process Research, ed. by Michael Carl, 113–40. Machine Translation: Technologies and Applications. Springer.

Carl, Michael, and Moritz Schaeffer. 2014. “Word Transition Entropy as an Indicator for Expected Machine Translation Quality.” In Proceedings of the Workshop on Automatic and Manual Metrics for Operational Translation Evaluation, ed. by Keith J. Miller, Lucia Specia, Kim Harris, and Stacey Bailey, 45–50. Reykjavik, Iceland. [URL]

Carl, Michael, Moritz Schaeffer, and Srinivas Bangalore. 2016. “The CRITT Translation Process Research Database.” In New Directions in Empirical Translation Process Research, edited by Michael Carl, Moritz Schaeffer, and Srinivas Bangalore, 13–54. Springer.

Carl, Michael, and Moritz Jonas Schaeffer. 2017. “Why Translation Is Difficult: A Corpus-Based Study of Non-Literality in Post-Editing and From-Scratch Translation.” HERMES – Journal of Language and Communication in Business, no. 56 (October): 43–57.

Germann, Ulrich. 2008. “Yawat: Yet Another Word Alignment Tool.” In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Demo Session, 20–23. Association for Computational Linguistics.

Gilbert, Devin, and Michael Carl. 2021. “Introducing a Word Alignment Dissimilarity Indicator: Alignment Links as Conceptualizations of a Focused Bilingual Lexicon.” In Proceedings of the First Workshop on Modelling Translation: Translatology in the Digital Age, ed. by Yuri Bizzoni, Elke Teich, Cristina España i Bonet, and Josef van Genabith, 74–81. Online, Berlin: Association for Computational Linguistics. [URL]

Lüdecke, Daniel. 2021. SjPlot: Data Visualization for Statistics in Social Science (version R package version 2.8.7). [URL]

Mesa-Lao, Bartolomé. 2014. “Gaze Behaviour on Source Texts: An Exploratory Study Comparing Translation and Post-Editing.” In Post-Editing of Machine Translation: Processes and Applications, 219–45. Copenhagen Business School.

Och, Franz Josef, and Hermann Ney. 2003. “A Systematic Comparison of Various Statistical Alignment Models.” Computational Linguistics 29 (1): 19–51.

Ogawa, Haruka, Devin Gilbert, and Samar A. Almazroei. 2021. “redBird: Rendering Entropy Data and ST-Based Information into a Rich Discourse on Translation: Investigating Relationships between MT Output and Human Translation.” In Explorations in Empirical Translation Process Research, ed. by Michael Carl, 141–63. Machine Translation: Technologies and Applications. Springer.

R Core Team. 2017. R: A Language and Environment for Statistical Computing (version 3.6.1). Vienna, Austria: R Foundation for Statistical Computing. [URL]

Sabet, Masoud Jalili, Philipp Dufter, François Yvon, and Hinrich Schütze. 2020. “SimAlign: High Quality Word Alignments without Parallel Training Data Using Static and Contextualized Embeddings.” In EMNLP (Findings) 2020: ArXiv:2004.08728 [Cs]. Online: ArXivLabs. [URL]

Schaeffer, Moritz, and Michael Carl. 2014. “Measuring the Cognitive Effort of Literal Translation Processes.” In Proceedings of the Workshop on Humans and Computer-Assisted Translation (HaCaT), ed. by Ulrich Germann, Michael Carl, Philipp Koehn, Germán Sanchis Trilles, Francisco Casacuberta, Robin Hill, and Sharon O’Brien, 29–37. Stroudsburg, Pennsylvania, USA: Association for Computational Linguistics.

Toledo-Báez, M. Cristina, and Michael Carl. 2020. “Assessing Low and High Translation Variation in Post-Editing.” In TT5 Translation in Transition Book of Abstracts, 41–45. Kent State University, Kent, Ohio, USA. [URL]

Chapter 10Impact of word alignment on word translation entropy and other metrics

A comparison of translation process research findings derived from different word alignment methods

Chapter 10
Impact of word alignment on word translation entropy and other metrics