An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus

Sanz-Villar, Zuriñe

doi:10.1075/scl.90.14san

Part of

Parallel Corpora for Contrastive and Translation Studies: New resources and applications
Edited by Irene Doval and M. Teresa Sánchez Nieto
[Studies in Corpus Linguistics 90] 2019
► pp. 233–247

An overview of Basque corpora and the extraction of certain multi-word expressions from a translational corpus

Zuriñe Sanz-Villar | University of the Basque Country (UPV/EHU)

Since the 1980s, considerable efforts have been made to create different types of Basque corpora. However, to systematically analyse the Basque translations of German literary texts, it was necessary to create a corpus from the ground up. Intermediary versions were included in this corpus whenever the Basque target text was not a translation from the German original but came instead from a translation into another language (Spanish in most cases). A tool called TAligner was used to align the bitexts and the tritexts. The aim of this chapter is, firstly, to provide the reader with an overview of the main Basque corpora. Secondly, I will describe the design and compilation process of a parallel and multilingual corpus using TAligner 3.0. Thirdly, I will present how the corpus has been lemmatized and annotated at the level of part-of-speech. Finally, the process of extracting potential Basque multi-word expressions will be shown.

Keywords: Basque corpora, Aleuska corpus, TAligner, Basque MWEs

Article outline

1.Introduction
2.An overview of Basque corpora
3.Design, compilation and annotation of the Aleuska corpus
4.Extraction of MWEs
5.Conclusion
Notes
References

Published online: 20 March 2019

https://doi.org/10.1075/scl.90.14san

References

Altzibar, Xabier & Bilbao, Xabier & Garai, Koldo

2011 Collocations in Basque: A test for classification. In Proceedings of the 5th International Conference on Meaning-Text Theory, Barcelona, September 8–9, 1–12.

Agerri, Rodrigo & Bermudez, Josu & Rigau, German

2014 IXA pipeline: efficient and ready to use multilingual NLP Tools. In Proceedings of the 9th Language Resources and Evaluation Conference (LREC2014), Reykjavik, May 26–31.

Areta, Nerea & Gurrutxaga, Anton & Leturia, Igor

2008 Begiratu bat corpus-baliabideei. BAT Soziolinguistika aldizkaria 62: 71–92.

Corpas Pastor, Gloria

2008 Investigar con corpus en traducción: los retos de un nuevo paradigma. Frankfurt: Peter Lang.

Hulden, Mans

2009 Foma: A finite-state toolkit and library. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, 29–32.

Ibarretxe Antuñano, Iraide & Martinez Lizarduikoa, Alfontso

2006 Hizkuntzaren bihotzean: Euskal onomatopeien hiztegia. Donostia-San Sebastian: Gaiak.

Kenny, Dorothy

2001 Lexis and Creativity in Translation. A corpus-based approach. Manchester: St. Jerome.

Leturia Azkarate, Igor

2013 Web-corpusen Ataria. Elhuyar aldizkaria 13(03): 294–295.

Serón Ordóñez, Inmaculada

2015 Cómo crear y analizar corpus paralelos. Un procedimiento con software accesible y económico y algunas sugerencias para software futuro. In Corpus-based Translation and Interpreting Studies: From description to application , María Teresa Sánchez Nieto (ed). Berlin: Frank & Timme. 167–190.

Sinclair, John

2005 Corpus and text-basic principles. In Developing Linguistic Corpora: A Guide to Good Practice, Martin Wyne (ed). Oxford: University of Oxford–AHDS Literature, Languages and Linguistics. [URL] (6 May 2017).

Urkia, Miriam

2010 Corpusgintzaren garrantzia hizkuntzalaritzan eta euskararen egoeran. [URL] (6 May 2017).

Zanettin, Federico

2012 Translation-driven Corpora. Manchester: St. Jerome.

Cited by

Cited by 2 other publications

Pérez Blanco, María & Marlén Izquierdo

2021. Chapter 6. Developing a corpus-informed tool for Spanish professionals writing specialised texts in English. In Corpora in Translation and Contrastive Research in the Digital Age [Benjamins Translation Library, 158], ► pp. 147 ff.

Sanz, Zuriñe & Olaia Andaluz-Pinedo

2021. Chapter 5. TAligner 3.0. In Corpora in Translation and Contrastive Research in the Digital Age [Benjamins Translation Library, 158], ► pp. 125 ff.

This list is based on CrossRef data as of 22 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.