Utilising heterogeneous language resources for term extraction in maritime domains
The development of terminologies for domains where these are lacking is a time-consuming and costly task. This
article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding,
utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important
player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official
maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains.
The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term
extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as
bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.
Article outline
- 1.Introduction
- 2.Historical and theoretical background
- 3.Methods and criteria for term extraction in maritime domains
- 3.1Maritime domains
- 3.2Overview of term extraction methods
- 3.3Criteria for unithood and termhood
- 4.Methodological specifics and results from the various term extraction methods
- 4.1Method 1: Frequency analysis of domain-specific corpus
- 4.2Method 2: Keyness analysis of domain-specific vs. general corpus
- 4.3Method 3: Collocation analysis of domain-specific corpus
- 4.4Method 4: Chunking of aligned sentences from a parallel domain-specific corpus
- 4.5Method 5: Retrieval of terms from domain-specific lexical resources
- 4.6Method 6: Retrieval of domain-specific entries in bilingual general dictionary
- 5.Results
- 6.Concluding remarks
- Acknowledgements
- Notes
-
References
References (44)
References
Ahmad, Khurshid, and Margaret A. Rogers. 2001. “Corpus
linguistics and terminology extraction.” In Handbook of Terminology
Management (Volume 21), ed.
by Sue-Ellen Wright and Gerhard Budin, 725–760. Amsterdam: John Benjamins.
Austlid, Einar. 1971. Norsk-engelsk ordliste for fiskarar [Norwegian-English dictionary
for fishermen]. Oslo: Reenskaugs forlag.
Andersen, Gisle. 2008. “Quantifying
domain-specificity: the occurrence of financial terms in a general
corpus.” SYNAPS 211: 37–52.
Andersen, Gisle. 2016. “Using
the corpus-driven method to chart discourse-pragmatic
change.” In Discourse-pragmatic variation and change in English: New
methods and insights, ed. by Heike Pichler, 21–40. Cambridge: Cambridge University Press.
Andersen, Gisle, Peder Gammeltoft, and Kjetil Gundersen. In
preparation. Termportalen – frå forprosjekt til fast
finansiering [The terminology Portal – from pilot project to permanent
funding]. To be published in Nordterm.
Andersen, Gisle, and Marita Kristiansen. 2013. “Towards
a national portal for Norwegian terminology in the CLARINO
project.” Terminologen 21:188–189.
Andersen, Gisle, and Marita Kristiansen. 2015. “Termportalen
som infrastruktur for terminologi i
Norge.” Terminologen 51: 53–60.
Bourigault, Didier. 1992. “Surface
grammatical analysis for the extraction of terminological noun
phrases.” In COLING ’92: Proceedings of the Fourteenth International
conference on Computational
Linguistics, 977–981. Nantes: ICC.
Bourigault, Didier. 1994. LEXTER,
un Logiciel d’Extraction de Terminologie: Application à l’acquisition de connaissances à partir de
textes. PhD Thesis, École des Hautes Études en Sciences Sociales, Paris.
Brekke, Magnar, Kai Innselset, Marita Kristiansen, and Kari Øvsthus. 2006. “KB-N:
Automatic term extraction from a knowledge-bank of
economics.” In Proceedings from LRECC
2006, 1912–1915, [URL]
Cabré, M. Teresa, María Estopa, Rosa Bagot, and Jordi Palatresi. 2001. “Automatic
term detection: A review of current systems.” In Recent advances in
computational terminology, ed. by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 53–88. Amsterdam: John Benjamins.
Drouin, Patrick, Jean-Benoît Morel, and Marie-Claude L’Homme. 2020. “Automatic
Term Extraction from Newspaper Corpora: Making the Most of Specificity and Common
Features.” Proceedings of the 6th International Workshop on Computational Terminology
(COMPUTERM 2020), 1–7.
Heid, Ulrich. 2006. “Extracting
term candidates from recursively chunked text.” In Terminology,
computing and translation, ed. by Pius ten Hacken, 97–115. Tübingen: Gunter Narr.
Hiemstra, Djoerd. 1998. “Multilingual
Domain Modeling in Twenty-One. Automatic Creation of a Bi-directional Translation Lexicon from a Parallel
Corpus.” In Proceedings of the 8th CLIN
meeting, ed. by P. H. Coppen, L. van Halsteren, and L. Teunissen, 41–58. Amsterdam: Rodopi.
Hofland, Knut, and Øystein Reigem. 2006. Translation
Corpus Aligner, version 2. An interactive sentence aligner. Paper presented
at ICAME. [URL]
Hofland, Knut, and Stig Johansson. 1998. “The
Translation Corpus Aligner: A program for automatic alignment of parallel
texts.” In Corpora and Cross-linguistic Research: Theory, Method, and
Case Studies, ed. by In Stig Johansson, and Signe Oksefjell, 87–100. Amsterdam: Rodopi.
Kageura, Kyo, and Elizabeth Marshman. 2019. “Terminology
Extraction and Management.” In The Routledge Handbook of Translation
and Technology, ed. by Minako O’Hagan, 61–77. London: Routledge.
Kolstad, Ellinor. 2006. “Skjær i sjøen under oversettelse av romanen Trawler” [Stumbling blocks in the translation of the novel
Trawler]. Språknytt 2006 (2): 19–23.
Kristiansen, Marita, and Magnar Brekke. 2004. “Kunnskapsbank
for norsk økonomisk- administrative fagdomene.” Språk og
språkundervisning 11.
McEnery, Tony, and Andrew Hardie. 2012. Corpus
linguistics. Cambridge: Cambridge University Press.
Musacchio, M. Teresa. 2017. Translating popular
science. Padova: CLEUP.
Myking, Johan. 2005. “Terminologi i Noreg – historisk oversyn” [Terminology in
Norway – an historical overview]. In Hvem tar ansvaret for
fagterminologien?, ed. by Jan Hoel, 2–15. Oslo: Språkrådet.
Myking, Johan. 2006. Nyare
terminologiarbeid i
Noreg. Språknytt 2006 (2): 13–18.
Nazarenko, Adeline, and Haifa Zargayouna. 2009. “Evaluating
term extraction.” International Conference Recent Advances in Natural Language Processing
(RANLP’09). Borovets, Bulgaria. 299–304. [URL]
Pettersen, Jan Martin. 1997. Go fishing! Engelsk
for fiskere, havbrukere og fisketilvirkere. [Go fishing! English for fishermen,
sea farmers and fish product
manufacturers.] Oslo: Landbruksforlaget.
Rayson, Paul, and Roger Garside. 2000. “Comparing
corpora using frequency profiling.” In Proceedings of the workshop on
Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational
Linguistics (ACL 2000), 1–6.
Rigouts Terryn, Ayla, Patrick Drouin, Veronique Hoste, and Els Lefever. 2020. “TermEval
2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER)
Dataset.” Proceedings of the LREC 2020 6th International Workshop on Computational Terminology
(COMPUTERM 2020), 85–94.
Rigouts Terryn, Ayla, Veronique Hoste, and Els Lefever. 2019. “In
No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable
Corpora.” Language Resources and
Evaluation, 54(2), 385–418.
Sinclair, John, Susan Jones, Robert Daley, and Ramesh Krishnamurthy. 2004. English
collocational studies: The OSTI
report. London: Continuum.
Solberg, Marte. 1995. A
dictionary and terminological analysis of merchant ship terms. Unpublished Master
thesis, NHH.
Stubbs, Michael. 2001. Words
and phrases: Corpus studies of lexical
semantics. Oxford: Blackwell.
Cited by (1)
Cited by one other publication
Dong, Jihua, Shuai Dong & Louisa Buckingham
This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.