Corpus driven identification of lexical bundle obsolescence in Late Modern English
This chapter explores a new methodology for extracting multi-word units that were once common but have since become obsolete from large corpora (esp. from the Google ngrams dataset of the Google Books project). It complements a modified frequency-based methodology previously used for detecting lexical obsolescence (Tichý 2018) with a bottom up approach to calculating association measures in multi-word sequences inspired by Wahl & Gries (2019). The analytical part examines expressions identified as potentially obsolete on their way from Late Modern to Present-day English. Conditions, circumstances and consequences of the loss of such expressions are considered with a focus on the competing forms expressing similar functions that may be recognized as supplanting the old forms.
Article outline
- 1.Introduction
- 2.Material
- 3.Methodology
- 3.1Thresholds
- 3.2Selection
- 4.Technical aspects
- 5.Analysis
- 5.1Trash
- 5.2Results
- 5.2.1Terminology
- 5.2.2“Quasi” terminology
- 5.2.3Appellations
- 5.2.4Legal/administrative phrases
- 5.2.5Dating
- 5.2.6Pragmatic markers
- 5.2.7Replacement in collocations
- 5.2.8Countability and accommodation
- 5.2.9Complex verb phrase
- 6.Discussion
- 7.Conclusions
-
Acknowledgements
-
Notes
-
References
-
Appendix