Denominative variation in the COVID-19 Open Research Dataset corpus
Since 2020, we have witnessed the emergence of new concepts and terms due to the pandemic outbreak. Some of them have even become obsolete in a short period of time whereas others are still misused despite standardization efforts. In this paper we study explicit denominative variation in the COVID-19 corpus, which consists of scientific articles released as part of the COVID-19 Open Research Dataset and is publicly available in Sketch Engine. First of all, variants for severe acute respiratory syndrome coronavirus 2 and coronavirus disease 2019 were extracted by means of knowledge patterns (e.g., also known as). The productiveness of knowledge patterns was analyzed and a set of 1,684 explicit variation excerpts were collected and manually annotated. A total of 371 variants were retrieved and organized in two polydenominative clusters (i.e., 177 for COVID-19 and 193 for SARS-CoV-2), which were then formally and semantically characterized by comparison with the established designations. Finally, possible causes underlying denominative variation are explored.
Article outline
- 1.Introduction
- 2.Denominative variation
- 2.1Causes of denominative variation
- 2.2Consequences of denominative variation
- 3.Materials and methods
- 3.1The COVID-19 corpus in Sketch Engine
- 3.2Extraction of denominative variants
- 3.3Establishing preferred denominations
- 4.Results
- 4.1Analysis of KPs
- 4.2Analysis of variants
- 4.2.1Formal characterization of variants
- Graphical changes
- Morphosyntactic changes
- Reductions
- Expansions
- Lexical changes
- Multiple changes
- 4.2.2Semantic characterization of variants
- Minimum semantic distance
- Medium semantic distance
- Maximum semantic distance
- 4.3Possible causes behind the use of COVID-19 and SARS-CoV-2 variants
- 5.Conclusions
- Notes
-
References