Automatic taxonomy extraction for specialized domains using distributional semantics

Nazar, Rogelio; Vivaldi, Jorge; Wanner, Leo

doi:10.1075/term.18.2.03naz

Article published In: Terminology
Vol. 18:2 (2012) ► pp.188–225

Get fulltext from our e-platform

Download PDF

Automatic taxonomy extraction for specialized domains using distributional semantics

Rogelio Nazar

Jorge Vivaldi

Leo Wanner

Published online: 7 September 2012

https://doi.org/10.1075/term.18.2.03naz

This article explores a statistical, language-independent methodology for the construction of taxonomies of specialized domains from noisy corpora. In contrast to proposals that exploit linguistic information by searching for lexico-syntactic patterns that tend to express the hypernymy relation, our methodology relies entirely upon the distributional semantics of terms as captured by their lexical co-occurrence in large scale corpora. In a first stage, we analyze the syntagmatic relations of terms that serve as seeds of the taxonomy to be constructed and we obtain, thus, the first batch of hypernym candidate terms for our seed terms. In a second stage, we analyze the paradigmatic relations of the terms by inspecting which terms show a prominent frequency of co-occurrence with the terms that, as we found in the previous stage, are syntagmatically related to our seed terms — which allows us to refine the first batch of hypernym candidate terms and obtain new ones. In a third and final stage, we build a taxonomy from the obtained hypernym candidate lists, exploiting the asymmetric statistic association between terms that is characteristic of the hypernymy relation.

Keywords: distributional semantics, quantitative linguistics, taxonomy extraction, terminology extraction

Cited by (6)

Cited by six other publications

Order by:

Vidal Sabanés, Laia & Iria da Cunha

2025. AI as a resource for the clarification of medical terminology. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 31:1 ► pp. 37 ff.

Chu, Deping, Bo Wan, Hong Li, Shuai Dong, Jinming Fu, Yiyang Liu, Kuan Huang & Hui Liu

2022. A machine learning approach to extracting spatial information from geological texts in Chinese. International Journal of Geographical Information Science 36:11 ► pp. 2169 ff.

San Martín, Antonio, Catherine Trekker & Pilar León-Araúz

2022. Repérage automatisé de l’hyponymie dans des corpus spécialisés en français à l’aide de Sketch Engine. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 28:2 ► pp. 264 ff.

Ahltorp, Magnus, Maria Skeppstedt, Shiho Kitajima, Aron Henriksson, Rafal Rzepka & Kenji Araki

2016. Expansion of medical vocabularies using distributional semantics on Japanese patient blogs. Journal of Biomedical Semantics 7:1

Bertels, Ann & Dirk Speelman

2014. Clustering for semantic purposes. Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication 20:2 ► pp. 279 ff.

[no author supplied]

2017. Term variation in specialised corpora [Terminology and Lexicography Research and Practice, 19],

This list is based on CrossRef data as of 6 december 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.