On identification of bilingual lexical bundles for translation purposes
The case of an English-Polish comparable corpus of patient information leaflets
Grounded in phraseology and corpus linguistics, this paper aims to explore the use of bilingual lexical bundles to
improve the degree of naturalness and textual fit of translated texts. More specifically, this study attempts to
identify lexical bundles, that is, recurrent sequences of 3–7 words with similar discursive functions in a
purpose-designed comparable corpus of English and Polish patient information leaflets, with 100 text samples in each
language. Because of cross-linguistic differences, we additionally apply a number of formal criteria in order to
filter out the bundles in each subcorpus. The results show that bilingual lexical bundles with overlapping discourse
functions in texts and extracted from comparable corpora hold unexplored potential for machine translation,
computer-assisted translation and bilingual lexicography.
Article outline
- 1.Introduction
- 2.Background and related work
- 3.Research material and methodology
- 4.Results
- 5.Discussion and conclusions
-
Notes
-
References
-
Appendix
References (48)
References
Allschwil: The European Association for Machine Translation. Available at: [URL] (accessed November 2014)
Barreiro, A., Monti, J., Batista F. & Orliac B. (2013). When Multiwords Go Bad in Machine Translation. In J. Monti, R. Mitkov, G. Corpas-Pastor, & V. Seretan (Eds.), Workshop Proceedings: Multi-Word Units in Machine Translation and Translation Technologies (pp.26–33).
Biber, D., S. Johansson, G. Leech, S. Conrad & Finegan, E. (1999). The Longman Grammar of Spoken and Written English. London: Longman.
Biber, D., Conrad, S. & Cortes, V. (2003). Lexical bundles in speech and writing: An initial taxonomy. In A. Wilson, P. Rayson, & T. McEnery (Eds.), Corpus Linguistics by the Lune: A Festschrift for Geoffrey Leech (pp.71–92). Frankfurt am Main: Peter Lang.
Biber, D., Conrad, S. & Cortes, V. (2004). “If you look at…: Lexical bundles in university teaching and textbooks”. Applied Linguistics, 25(3), 371–405.
Biel, Ł. (2014). Lost in the Eurofog. The Textual Fit of Translated Law. Frankfurt am Main: Peter Lang.
Bouayad-Agha, N. (2006) The Patient Information Leaflet (PIL) corpus. Available at: [URL] (accessed May 2012).
Callison-Burch, Ch., Fordyce, C., Koehn, P., Monz, Ch. & Schroeder, J. (2007). (Meta-) Evaluation of Machine Translation.
StatMT '07 Proceedings of the Second Workshop on Statistical Machine
Translation, Association for Computational Linguistics, 136–158. Available at: [URL] (accessed February 2015).
Callison-Burch, Ch., Fordyce, C., Koehn, P., Monz, Ch. & Schroeder, J. (2008). Further Meta-Evaluation of Machine Translation.
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation,
Association for Computational Linguistics, 70–106. Available at: [URL] (accessed February 2015).
Chen, Y.-H. & Baker, P. (2010). Lexical bundles in L1 and L2 academic writing. Language Learning and Technology, 14(2), 30–49.
Chesterman, A. (2004). Hypothesis about translation universals. In G. Hansen, K. Malmkjaer, & D. Gile (Eds.), Claims, Changes and Challenges in Translation Studies (pp.1–13). Amsterdam: John Benjamins.
Cobb, T. (2003). Review: Alison Wray. 2001. Formulaic Language and the Lexicon. Cambridge: Cambridge. University
Press. xi + 332pp. Canadian Journal of Applied Linguistics, 6 (1), 105–110.
di Buono, M., Monti, J., Monteleone, M. & Marano, F. (2013). Multiword processing in an ontology-based Cross-Language Information Retrieval model for specific
domain collections. In J. Monti, R. Mitkov, G. Corpas-Pastor, & V. Seretan (Eds.), Workshop Proceedings: Multi-Word Units in Machine Translation and Translation Technologies (pp.43–52). Allschwil: The European Association for Machine Translation. Available at: [URL] (accessed November 2014).
Farwell, D., Guthrie, L. & Wilks, Y. (1993). Automatically Creating Lexical Entries for ULTRA, a Multilingual MT System. Machine Translation, 8, 127–145.
Frantzi, K., Ananiadou, S. & Mima, H. (2000). Automatic recognition of multi-word terms: the C-value/NC-value method. International Journal on Digital Libraries, 3(2), 115–130.
Goźdź-Roszkowski, S. (2011). Patterns of Linguistic Variation in American Legal English. A Corpus-Based Study. Frankfurt am Main: Peter Lang Verlag.
Grabowski, Ł. (2014). On Lexical Bundles in Polish Patient Information Leaflets: A Corpus-Driven Study. Studies in Polish Linguistics, 9(1), 21–43.
Grabowski, Ł. (2015). Keywords and lexical bundles within English pharmaceutical discourse: a corpus-driven
description. English for Specific Purposes, 38, 23–33.
Granger, S. (2010). Comparable and translation corpora in cross-linguistic research. Design, analysis and
applications. Journal of Shanghai Jiaotong University, 2, 14–21. Available at: [URL] (accessed November 2014).
Granger, S. (2014). A lexical bundle approach to comparing languages. Stems in English and French. In M.-A. Lefer, & S. Vogeleer (Eds.), Genre- and register-related discourse features in contrast. Special issue of
Languages in Contrast
, 14(1), 58–72.
Hoey, M. (2005). Lexical Priming: A New Theory of Words and Language. London: Routledge.
Hoey, M. (2007). Lexical priming and literary creativity. In M. Hoey, M. Mahlberg, M. Stubbs, & W. Teubert (Eds.), Text, Discourse and Corpora. London: Continuum, 7–30.
Hoang, H. & Koehn, P. (2008). Design of the Moses Decoder for Statistical Machine Translation. Software Engineering, Testing, and Quality Assurance for Natural Language Processing (pp.58–65). Columbus, Ohio, USA, June (2008). Association for Computational Linguistics. Available at: [URL] (accessed November 2014).
Hyland, K. (2008). As can be seen: Lexical bundles and disciplinary variation. English for Specific Purposes, 27, 4–21.
Kajzer-Wietrzny, M. (2012). Interpreting Universals and Interpreting Style. Unpublished PhD dissertation. Adam Mickiewicz University, Poznań, Poland. Available at: [URL] (accessed September 2012).
Kilgarriff, A. (2005). Language is never ever ever random. Corpus Linguistics and Linguistic Theory, 1(2), 263–276.
Koehn, P., Hoang, H., Birch, A., Callison-Burch, Ch., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, Ch., Zens, R., Dyer, Ch., Bojar, O., Constantin, A. & Herbst, E. (2007). Moses: Open Source Toolkit for Statistical Machine Translation.
Annual Meeting of the Association for Computational Linguistics (ACL), Prague,
Czech Republic, June 2007. Available at: [URL] (accessed November 2014).
Laviosa, S. (1998). Core patterns of lexical use in a comparable corpus of English narrative prose. Meta, 43(4), 557–570.
Montalt Resurreccio, V. & Gonzalez Davies, M. (2007). Medical Translation Step by Step. Translation Practices explained. Manchester: St. Jerome Publishing.
Olohan, M. (2004). Introducing Corpora in Translation Studies. London/New York: Routledge.
Olohan, M. & Baker, M. (2000). Reporting that in translated English: Evidence for subconscious processes of
explicitation?. Across Languages and Cultures, 1, 141–172 (cited in Olohan 2004: 94).
Papineni, K., Roukos, S., Ward, T., Zhu, W-J. (2002). BLEU: a method for automatic evaluation of machine translation.
Proceedings for the 40th Annual Meeting of the Association for Computation
Linguistics, Philadelphia, July 2002. (pp.311–318). Available at: [URL] (accessed November 2014).
Ren, Z., Lu, Y., Cao, J., Liu, Q. & Huang, Y. (2009). Improving Statistical Machine Translation Using Domain Bilingual Multiword
Expressions. Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and
Applications. MWE’ 09. (pp.47–54). Stroudsburg: Association for Computational Linguistics. Available at: [URL] (accessed November 2014).
Sag, I., Baldwin, T., Bond, F., Copestake, A., & Flickinger D. (2002). Multiword Expressions: A Pain in the Neck for NLP.
Computational Linguistics and Intelligent Text Processing: Third International
Conference (CICLing 2002), 1–15. Available at: [URL] (accessed May 2013).
Salazar, D. (2011). Lexical bundles in scientific English: A corpus-based study of native and non-native writing. Unpublished PhD dissertation. University of Barcelona. Available at: [URL] (accessed March 2013)
Scott, D., Bouayad-Agha, N., Power, R., Shultz, S., Beck, R., Murphy, D. & Lockwood, R.. (2001). PILLS: A Multilingual Authoring System for Patient Information. Proceedings of the 2001 Meeting of the American Medical Informatics Association (AMAI'01), Washington,
D.C., USA. Available at: [URL] (accessed May 2013).
Scott, M. (2007). WordSmith Tools 4.0. Liverpool: Lexical Analysis Software.
Wilks, Y. (2009). Machine Translation: Its Scope and Limits. New York: Springer.
Cited by (2)
Cited by two other publications
Lee, Changsoo
2022.
How do machine translators measure up to human literary translators in stylometric tests?.
Digital Scholarship in the Humanities 37:3
► pp. 813 ff.
Mikhailov, Mikhail
2021.
Mind the Source Data! Translation Equivalents and Translation Stimuli from Parallel Corpora. In
New Perspectives on Corpus Translation Studies [
New Frontiers in Translation Studies, ],
► pp. 259 ff.
This list is based on CrossRef data as of 27 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.