An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus
Since the 1980s, considerable efforts have been made to create different types of Basque corpora. However, to systematically analyse the Basque translations of German literary texts, it was necessary to create a corpus from the ground up. Intermediary versions were included in this corpus whenever the Basque target text was not a translation from the German original but came instead from a translation into another language (Spanish in most cases). A tool called TAligner was used to align the bitexts and the tritexts. The aim of this chapter is, firstly, to provide the reader with an overview of the main Basque corpora. Secondly, I will describe the design and compilation process of a parallel and multilingual corpus using TAligner 3.0. Thirdly, I will present how the corpus has been lemmatized and annotated at the level of part-of-speech. Finally, the process of extracting potential Basque multi-word expressions will be shown.
Article outline
- 1.Introduction
- 2.An overview of Basque corpora
- 3.Design, compilation and annotation of the Aleuska corpus
- 4.Extraction of MWEs
- 5.Conclusion
-
Notes
-
References
References
Altzibar, Xabier & Bilbao, Xabier & Garai, Koldo
2011 Collocations in Basque: A test for classification. In
Proceedings of the 5th International Conference on Meaning-Text Theory, Barcelona, September 8–9, 1–12.
Agerri, Rodrigo & Bermudez, Josu & Rigau, German
2014 IXA pipeline: efficient and ready to use multilingual NLP Tools. In
Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), Reykjavik, May 26–31.
Areta, Nerea & Gurrutxaga, Anton & Leturia, Igor
2008 Begiratu bat corpus-baliabideei.
BAT Soziolinguistika aldizkaria 62: 71–92.
Corpas Pastor, Gloria
2008 Investigar con corpus en traducción: los retos de un nuevo paradigma. Frankfurt: Peter Lang.
Hulden, Mans
2009 Foma: A finite-state toolkit and library. In
Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 29–32.
Ibarretxe Antuñano, Iraide & Martinez Lizarduikoa, Alfontso
2006 Hizkuntzaren bihotzean: Euskal onomatopeien hiztegia. Donostia-San Sebastian: Gaiak.
Kenny, Dorothy
2001 Lexis and Creativity in Translation. A corpus-based approach. Manchester: St. Jerome.
Leturia Azkarate, Igor
2013 Web-corpusen Ataria.
Elhuyar aldizkaria 13(03): 294–295.
Serón Ordóñez, Inmaculada
2015 Cómo crear y analizar corpus paralelos. Un procedimiento con software accesible y económico y algunas sugerencias para software futuro. In
Corpus-based Translation and Interpreting Studies:
From description to application
,
María Teresa Sánchez Nieto (ed). Berlin: Frank & Timme. 167–190.
Sinclair, John
2005 Corpus and text-basic principles. In
Developing Linguistic Corpora: A Guide to Good Practice,
Martin Wyne (ed). Oxford: University of Oxford–AHDS Literature, Languages and Linguistics.
[URL] (6 May 2017).
Urkia, Miriam
2010 Corpusgintzaren garrantzia hizkuntzalaritzan eta euskararen egoeran.
[URL] (6 May 2017).
Zanettin, Federico
2012 Translation-driven Corpora. Manchester: St. Jerome.
Cited by
Cited by 2 other publications
Pérez Blanco, María & Marlén Izquierdo
Sanz, Zuriñe & Olaia Andaluz-Pinedo
This list is based on CrossRef data as of 22 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.