Utilising heterogeneous language resources for term extraction in maritime domains

Andersen, Gisle

doi:10.1075/term.20024.and

Article published In:

Terminology
Vol. 28:1 (2022) ► pp.1–36

Utilising heterogeneous language resources for term extraction in maritime domains

Gisle Andersen | Norwegian School of Economics

The development of terminologies for domains where these are lacking is a time-consuming and costly task. This article takes a methodological perspective and addresses a general methodological question: how can we, with limited funding, utilise to a maximal degree, existing language resources to create a terminology at a relatively low cost? Although an important player in the maritime industries for many centuries, Norway has not prioritised the systematic development of an official maritime terminology. The article therefore focuses specifically on efforts to develop a national resource for maritime domains. The article describes efforts to create a corpus of popular science and a parallel corpus of technical texts. Six different term extraction methods are applied. These include corpus-based statistical analyses of frequency, collocation and keyness, as well as bilingual term extraction. Finally, the pros and cons of each method are evaluated by means of a cost-benefit analysis.

Keywords: Term extraction, Norwegian language, Language for Specific Purposes (LSP), corpus linguistics, natural language processing (NLP), marine and maritime domains

Article outline

1.Introduction
2.Historical and theoretical background
3.Methods and criteria for term extraction in maritime domains
- 3.1Maritime domains
- 3.2Overview of term extraction methods
- 3.3Criteria for unithood and termhood
4.Methodological specifics and results from the various term extraction methods
- 4.1Method 1: Frequency analysis of domain-specific corpus
- 4.2Method 2: Keyness analysis of domain-specific vs. general corpus
- 4.3Method 3: Collocation analysis of domain-specific corpus
- 4.4Method 4: Chunking of aligned sentences from a parallel domain-specific corpus
- 4.5Method 5: Retrieval of terms from domain-specific lexical resources
- 4.6Method 6: Retrieval of domain-specific entries in bilingual general dictionary
5.Results
6.Concluding remarks
Acknowledgements
Notes
References

Published online: 10 September 2021

https://doi.org/10.1075/term.20024.and

References (44)

References

Ahmad, Khurshid, and Margaret A. Rogers. 2001. “Corpus linguistics and terminology extraction.” In Handbook of Terminology Management (Volume 21), ed. by Sue-Ellen Wright and Gerhard Budin, 725–760. Amsterdam: John Benjamins.

Ahmad, Khurshid, Andrea E. Davies, Heather Fulford, and Margaret A. Rogers. 1994. “What is a term? The semi-automatic extraction of terms from text.” In Translation Studies – An Interdiscipline, ed. by Mary Snell-Hornby, Franz Pöchhacker and Klaus Kaindl, 267–278.

Austlid, Einar. 1971. Norsk-engelsk ordliste for fiskarar [Norwegian-English dictionary for fishermen]. Oslo: Reenskaugs forlag.

Andersen, Gisle. 2008. “Quantifying domain-specificity: the occurrence of financial terms in a general corpus.” SYNAPS 211: 37–52.

(ed.). 2012. Exploring Newspaper Language – Using the web to create and investigate a large corpus of modern Norwegian. Amsterdam: John Benjamins.

. 2016. “Using the corpus-driven method to chart discourse-pragmatic change.” In Discourse-pragmatic variation and change in English: New methods and insights, ed. by Heike Pichler, 21–40. Cambridge: Cambridge University Press.

Andersen, Gisle, Peder Gammeltoft, and Kjetil Gundersen. In preparation. Termportalen – frå forprosjekt til fast finansiering [The terminology Portal – from pilot project to permanent funding]. To be published in Nordterm.

Andersen, Gisle, and Knut Hofland. 2012. “Building a large corpus based on newspapers from the web.” In Exploring Newspaper Language, ed. by Gisle Andersen, 1–28. Amsterdam: John Benjamins.

Andersen, Gisle, and Marita Kristiansen. 2013. “Towards a national portal for Norwegian terminology in the CLARINO project.” Terminologen 21:188–189.

. 2015. “Termportalen som infrastruktur for terminologi i Norge.” Terminologen 51: 53–60.

Lyse, Gunn Inger, and Gisle Andersen. 2012. “Collocations and statistical analysis of n-grams: Multiword expressions in newspaper text.” In Exploring Newspaper Language, ed. by Gisle Andersen, 79–109, Amsterdam: John Benjamins.

Bondi, Marina. 2010. “Perspectives on keywords and keyness: An introduction.” In Keyness in Texts, ed. by Marina Bondi, and Mike Scott. Amsterdam, John Benjamins, 1–18.

Bourigault, Didier. 1992. “Surface grammatical analysis for the extraction of terminological noun phrases.” In COLING ’92: Proceedings of the Fourteenth International conference on Computational Linguistics, 977–981. Nantes: ICC.

. 1994. LEXTER, un Logiciel d’Extraction de Terminologie: Application à l’acquisition de connaissances à partir de textes. PhD Thesis, École des Hautes Études en Sciences Sociales, Paris.

Brekke, Magnar, Kai Innselset, Marita Kristiansen, and Kari Øvsthus. 2006. “KB-N: Automatic term extraction from a knowledge-bank of economics.” In Proceedings from LRECC 2006, 1912–1915, [URL]

Cabré, M. Teresa. 2003. “Theories of terminology: Their description, prescription and explanation.” Terminology 9(2): 163–199.

Cabré, M. Teresa, María Estopa, Rosa Bagot, and Jordi Palatresi. 2001. “Automatic term detection: A review of current systems.” In Recent advances in computational terminology, ed. by Didier Bourigault, Christian Jacquemin, and Marie-Claude L’Homme, 53–88. Amsterdam: John Benjamins.

Cabré, M. Teresa. 1999. Terminology: Theory, methods and applications. Amsterdam: John Benjamins.

Drouin, Patrick, Jean-Benoît Morel, and Marie-Claude L’Homme. 2020. “Automatic Term Extraction from Newspaper Corpora: Making the Most of Specificity and Common Features.” Proceedings of the 6th International Workshop on Computational Terminology (COMPUTERM 2020), 1–7.

Foo, Jody, and Magnus Merkel. 2010. “Computer aided term bank creation and standardization: Building standardized term banks through automated term extraction and advanced editing tools”. In Terminology in Everyday Life, ed. by Marcel Thelen and Frieda Steurs, 163–180. Amsterdam: John Benjamins.

Fulford, Heather. 2001. “Exploring terms and their linguistic environment: A domain-independent approach to automated term extraction.” Terminology 7(2): 259–279.

Heid, Ulrich. 2006. “Extracting term candidates from recursively chunked text.” In Terminology, computing and translation, ed. by Pius ten Hacken, 97–115. Tübingen: Gunter Narr.

Hiemstra, Djoerd. 1998. “Multilingual Domain Modeling in Twenty-One. Automatic Creation of a Bi-directional Translation Lexicon from a Parallel Corpus.” In Proceedings of the 8th CLIN meeting, ed. by P. H. Coppen, L. van Halsteren, and L. Teunissen, 41–58. Amsterdam: Rodopi.

Hofland, Knut, and Øystein Reigem. 2006. Translation Corpus Aligner, version 2. An interactive sentence aligner. Paper presented at ICAME. [URL]

Hofland, Knut, and Stig Johansson. 1998. “The Translation Corpus Aligner: A program for automatic alignment of parallel texts.” In Corpora and Cross-linguistic Research: Theory, Method, and Case Studies, ed. by In Stig Johansson, and Signe Oksefjell, 87–100. Amsterdam: Rodopi.

Kageura, Kyo, and Elizabeth Marshman. 2019. “Terminology Extraction and Management.” In The Routledge Handbook of Translation and Technology, ed. by Minako O’Hagan, 61–77. London: Routledge.

Kageura, Kyo, and Bin Umino. 1996. “Methods of automatic term recognition.” Terminology, 3(2), 259–289.

Kolstad, Ellinor. 2006. “Skjær i sjøen under oversettelse av romanen Trawler” [Stumbling blocks in the translation of the novel Trawler]. Språknytt 2006 (2): 19–23.

Kristiansen, Marita, and Magnar Brekke. 2004. “Kunnskapsbank for norsk økonomisk- administrative fagdomene.” Språk og språkundervisning 11.

Macken, Lieve, Els Lefever, and Veronique Hoste. 2013. “TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment.” Terminology, 19(1), 1–30.

McEnery, Tony, and Andrew Hardie. 2012. Corpus linguistics. Cambridge: Cambridge University Press.

Musacchio, M. Teresa. 2017. Translating popular science. Padova: CLEUP.

Myking, Johan. 2005. “Terminologi i Noreg – historisk oversyn” [Terminology in Norway – an historical overview]. In Hvem tar ansvaret for fagterminologien?, ed. by Jan Hoel, 2–15. Oslo: Språkrådet.

. 2006. Nyare terminologiarbeid i Noreg. Språknytt 2006 (2): 13–18.

Nazarenko, Adeline, and Haifa Zargayouna. 2009. “Evaluating term extraction.” International Conference Recent Advances in Natural Language Processing (RANLP’09). Borovets, Bulgaria. 299–304. [URL]

Pettersen, Jan Martin. 1997. Go fishing! Engelsk for fiskere, havbrukere og fisketilvirkere. [Go fishing! English for fishermen, sea farmers and fish product manufacturers.] Oslo: Landbruksforlaget.

Rayson, Paul, and Roger Garside. 2000. “Comparing corpora using frequency profiling.” In Proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), 1–6.

Rayson, Paul, Geoffrey Leech, and Mary Hodges. 1997. “Social differentiation in the use of English vocabulary: some analyses of the conversational component of the British National Corpus.” International Journal of Corpus Linguistics 2 (1):133–52.

Rigouts Terryn, Ayla, Patrick Drouin, Veronique Hoste, and Els Lefever. 2020. “TermEval 2020: Shared Task on Automatic Term Extraction Using the Annotated Corpora for Term Extraction Research (ACTER) Dataset.” Proceedings of the LREC 2020 6th International Workshop on Computational Terminology (COMPUTERM 2020), 85–94.

Rigouts Terryn, Ayla, Veronique Hoste, and Els Lefever. 2019. “In No Uncertain Terms: A Dataset for Monolingual and Multilingual Automatic Term Extraction from Comparable Corpora.” Language Resources and Evaluation, 54(2), 385–418.

Sinclair, John, Susan Jones, Robert Daley, and Ramesh Krishnamurthy. 2004. English collocational studies: The OSTI report. London: Continuum.

Solberg, Marte. 1995. A dictionary and terminological analysis of merchant ship terms. Unpublished Master thesis, NHH.

Stubbs, Michael. 2001. Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell.

Vintar, Špela. 2010. “Bilingual Term Recognition Revisited.” Terminology, 16(2), 141–158.

Cited by (1)

Cited by one other publication

Dong, Jihua, Shuai Dong & Louisa Buckingham

2023. A discourse dynamics exploration of terminology for Covid-19 in professional and public discourse. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 29:2 ► pp. 224 ff.

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.