Building EPTIC
A many-sided, multi-purpose corpus of EU parliament proceedings
This chapter describes the steps involved in the construction of EPTIC, an intermodal corpus of European Parliament speeches. Despite its limited size, this corpus has features that justify its labour-intensive building process, in particular its multiple alignments. The text-to-text alignments allow users to compare interpretations and translations of source speeches and their written-up reports, while text-to-video alignments allow them to access the multimedia components from concordance lines. To illustrate the potential of EPTIC, a case study is presented of English loan words in original, translated and interpreted Italian and French. Results suggest that borrowing is more likely to occur in translated Italian than in any of the other corpus components.
Article outline
- 1.Introduction: Why another corpus of European Parliament speeches?
- 2.What EPTIC looks like
- 2.1One corpus, fourteen subcorpora
- 2.2Practical details: Size and availability
- 3.Building EPTIC
- 3.1Selecting and obtaining raw corpus materials
- 3.2Transcribing the oral data
- 3.3Adding metadata
- 3.4Performing text-to-text alignment
- 3.5Performing text-to-video alignment
- 3.6POS-tagging, lemmatization and indexing
- 4.An example: English loan words in Italian and French
- 5.Conclusion: Teaming up
-
Acknowledgement
-
Notes
-
References
References (21)
References
Bernardini, Silvia, Collard, Camille, Ferraresi, Adriano, Russo Mariachiara & Defrancq, Bart. 2018. Building interpreting and intermodal corpora: A how-to for a formidable task. In Making Way in Corpus-based Interpreting Studies, Mariachiara Russo, Claudio Bendazzoli & Bart Defrancq (eds), 21–42. Singapore: Springer. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Bogaards, Paul. 2008. On ne parle pas franglais: La langue française face à l'anglais. Brussels: De Boeck/Duculot. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Burnard, Lou. 2004. Metadata for corpus work. In Developing Linguistic Corpora: A Guide to Good Practice, Martin Wynne (ed.). <[URL]> (30 June 2017).![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Chesterman, Andrew. 2004. Hypotheses about translation universals. In Claims, Changes and Challenges in Translation Studies [Benjamins Translation Library 50], Gyde Hansen, Kirsten Malmkjaer & Daniel Gile (eds), 1–13. Amsterdam: John Benjamins. ![DOI logo](https://benjamins.com/logos/doi-logo.svg)
![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Codrea-Rado, Anna. 2014. European parliament has 24 official languages, but MEPs prefer English. The Guardian. <[URL]> (30 October 2017).![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Evert, Stefan & the CWB Development Team. 2016. The IMS Open Corpus Workbench (CWB) Corpus Encoding Tutorial. CWB Version 3.4: <[URL]> (30 October 2017).
Frankenberg-Garcia, Ana & Santos, Diana. 2003. Introducing COMPARA: The Portuguese–English parallel corpus. In Corpora in Translator Educatio, Federico Zanettin, Silvia Bernardini & Dominic Stewart (eds), 71–87. Manchester: St. Jerome.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Granger, Sylviane. 2010. Comparable and translation corpora in cross-linguistic research. Design, analysis and applications. Journal of Shanghai Jiaotong University 2: 14–21.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Johansson, Stig. 1998. On the role of corpora in cross-linguistic research. In Corpora and Cross-linguistic Research, Stig Johansson & Signe Oksefjell (eds), 3–24. Amsterdam: Rodopi.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Koehn, Philipp. 2005. Europarl: A parallel corpus for statistical machine translation. In Machine Translation Summit X, 79–86. Phuket, Thailand.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Niemants, Natacha. 2015. Transcription. In The Routledge Encylopedia of Intepreting Studies, Franz Pöchhacker (ed), 421–422. London: Routledge.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Nisioi, Sergiu, Rabinovich, Ella, Dinu, Liviu P. & Wintner, Shuly. 2016. A corpus of native, non-native and translated texts. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4197–4201.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Rychlý, Pavel. 2007. Manatee/Bonito – A modular corpus manager. In 1st Workshop on Recent Advances in Slavonic Natural Language Processing, 65–70. Masaryk University, Brno.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Varga, Dániel, Németh, László, Halácsy, Péter, Kornai, András, Viktor Trón & Nagy, Viktor. 2005. Parallel corpora for medium density languages. In Proceedings of the RANLP 2005, 590–596.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Vondřička, Pavel. 2014. Aligning parallel texts with InterText. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), 1875–1879.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Zanettin, Federico. 2012. Translation-driven Corpora: Corpus Resources for Descriptive and Applied Translation Studies. Abingdon: Taylor & Francis.![Google Scholar](https://benjamins.com/logos/google-scholar.svg)
Cited by (3)
Cited by three other publications
Bendazzoli, Claudio, Michela Bertozzi & Mariachiara Russo
2020.
Du texte aux ressources multimodales : faire avancer la recherche en interprétation à partir d’un corpus déjà existant†.
Meta 65:1
► pp. 211 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
Ferraresi, Adriano, Silvia Bernardini, Maja Miličević Petrović & Marie-Aude Lefer
2019.
Simplified or not Simplified? The Different Guises of Mediated English at the European Parliament.
Meta 63:3
► pp. 717 ff.
![DOI logo](//benjamins.com/logos/doi-logo.svg)
This list is based on CrossRef data as of 27 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.