Building EPTIC
A many-sided, multi-purpose corpus of EU parliament proceedings
This chapter describes the steps involved in the construction of EPTIC, an intermodal corpus of European Parliament speeches. Despite its limited size, this corpus has features that justify its labour-intensive building process, in particular its multiple alignments. The text-to-text alignments allow users to compare interpretations and translations of source speeches and their written-up reports, while text-to-video alignments allow them to access the multimedia components from concordance lines. To illustrate the potential of EPTIC, a case study is presented of English loan words in original, translated and interpreted Italian and French. Results suggest that borrowing is more likely to occur in translated Italian than in any of the other corpus components.
Article outline
- 1.Introduction: Why another corpus of European Parliament speeches?
- 2.What EPTIC looks like
- 2.1One corpus, fourteen subcorpora
- 2.2Practical details: Size and availability
- 3.Building EPTIC
- 3.1Selecting and obtaining raw corpus materials
- 3.2Transcribing the oral data
- 3.3Adding metadata
- 3.4Performing text-to-text alignment
- 3.5Performing text-to-video alignment
- 3.6POS-tagging, lemmatization and indexing
- 4.An example: English loan words in Italian and French
- 5.Conclusion: Teaming up
-
Acknowledgement
-
Notes
-
References
References
Bernardini, Silvia, Collard, Camille, Ferraresi, Adriano, Russo Mariachiara & Defrancq, Bart
2018 Building interpreting and intermodal corpora: A how-to for a formidable task. In
Making Way in Corpus-based Interpreting Studies,
Mariachiara Russo,
Claudio Bendazzoli &
Bart Defrancq (eds), 21–42. Singapore: Springer.
Bogaards, Paul
2008 On ne parle pas franglais: La langue française face à l'anglais. Brussels: De Boeck/Duculot.
Burnard, Lou
2004 Metadata for corpus work. In
Developing Linguistic Corpora: A Guide to Good Practice,
Martin Wynne (ed.).
[URL] (30 June 2017).
Chesterman, Andrew
2004 Hypotheses about translation universals. In
Claims, Changes and Challenges in Translation Studies [
Benjamins Translation Library 50],
Gyde Hansen,
Kirsten Malmkjaer &
Daniel Gile (eds), 1–13. Amsterdam: John Benjamins.
Codrea-Rado, Anna
2014 European parliament has 24 official languages, but MEPs prefer English.
The Guardian.
[URL] (30 October 2017).
Evert, Stefan & the CWB Development Team
2016 The IMS Open Corpus Workbench (CWB) Corpus Encoding Tutorial. CWB Version 3.4:
[URL] (30 October 2017).
Frankenberg-Garcia, Ana & Santos, Diana
2003 Introducing COMPARA: The Portuguese–English parallel corpus. In
Corpora in Translator Educatio,
Federico Zanettin,
Silvia Bernardini &
Dominic Stewart (eds), 71–87. Manchester: St. Jerome.
Granger, Sylviane
2010 Comparable and translation corpora in cross-linguistic research. Design, analysis and applications.
Journal of Shanghai Jiaotong University 2: 14–21.
Johansson, Stig
1998 On the role of corpora in cross-linguistic research. In
Corpora and Cross-linguistic Research,
Stig Johansson &
Signe Oksefjell (eds), 3–24. Amsterdam: Rodopi.
Koehn, Philipp
2005 Europarl: A parallel corpus for statistical machine translation. In
Machine Translation Summit X, 79–86. Phuket, Thailand.
Niemants, Natacha
2015 Transcription. In
The Routledge Encylopedia of Intepreting Studies,
Franz Pöchhacker (ed), 421–422. London: Routledge.
Nisioi, Sergiu, Rabinovich, Ella, Dinu, Liviu P. & Wintner, Shuly
2016 A corpus of native, non-native and translated texts. In
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), 4197–4201.
Pietrandrea, Paola, Kahane, Sylvain, Lacheret-Dujour, Anne & Sabio, Frédéric
Rychlý, Pavel
2007 Manatee/Bonito – A modular corpus manager. In
1st Workshop on Recent Advances in Slavonic Natural Language Processing, 65–70. Masaryk University, Brno.
Varga, Dániel, Németh, László, Halácsy, Péter, Kornai, András, Viktor Trón & Nagy, Viktor
2005 Parallel corpora for medium density languages. In
Proceedings of the RANLP 2005, 590–596.
Vondřička, Pavel
2014 Aligning parallel texts with InterText. In
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), 1875–1879.
Zanettin, Federico
2012 Translation-driven Corpora: Corpus Resources for Descriptive and Applied Translation Studies. Abingdon: Taylor & Francis.
Cited by
Cited by 3 other publications
Bendazzoli, Claudio, Michela Bertozzi & Mariachiara Russo
2020.
Du texte aux ressources multimodales : faire avancer la recherche en interprétation à partir d’un corpus déjà existant†.
Meta 65:1
► pp. 211 ff.
Ferraresi, Adriano, Silvia Bernardini, Maja Miličević Petrović & Marie-Aude Lefer
2019.
Simplified or not Simplified? The Different Guises of Mediated English at the European Parliament.
Meta 63:3
► pp. 717 ff.
This list is based on CrossRef data as of 22 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.