Corpus driven identification of lexical bundle obsolescence in Late Modern English

Tichý, Ondřej

doi:10.1075/slcs.218.04tic

Part of

Lost in Change: Causes and processes in the loss of grammatical elements and constructions
Edited by Svenja Kranich and Tine Breban
[Studies in Language Companion Series 218] 2021
► pp. 101–130

Corpus driven identification of lexical bundle obsolescence in Late Modern English

editor

Ondřej Tichý | Charles University

This chapter explores a new methodology for extracting multi-word units that were once common but have since become obsolete from large corpora (esp. from the Google ngrams dataset of the Google Books project). It complements a modified frequency-based methodology previously used for detecting lexical obsolescence (Tichý 2018) with a bottom up approach to calculating association measures in multi-word sequences inspired by Wahl & Gries (2019). The analytical part examines expressions identified as potentially obsolete on their way from Late Modern to Present-day English. Conditions, circumstances and consequences of the loss of such expressions are considered with a focus on the competing forms expressing similar functions that may be recognized as supplanting the old forms.

Keywords: lexicology, corpus linguistics, diachronic linguistics, obsolescence, ngrams, lexical bundles, multi-word expressions, Late Modern English, Google Books

Article outline

1.Introduction
2.Material
3.Methodology
- 3.1Thresholds
- 3.2Selection
4.Technical aspects
5.Analysis
- 5.1Trash
- 5.2Results
  - 5.2.1Terminology
  - 5.2.2“Quasi” terminology
  - 5.2.3Appellations
  - 5.2.4Legal/administrative phrases
  - 5.2.5Dating
  - 5.2.6Pragmatic markers
  - 5.2.7Replacement in collocations
  - 5.2.8Countability and accommodation
  - 5.2.9Complex verb phrase
6.Discussion
7.Conclusions
Acknowledgements
Notes
References
Appendix

Published online: 16 June 2021

https://doi.org/10.1075/slcs.218.04tic

References (20)

References

Aitchison, Jean. 2012. Words in the Mind: An Introduction to the Mental Lexicon. Oxford: Wiley-Blackwell.

Biber, Douglas, Johansson, Stig, Leech, Geoffrey, Conrad, Susan & Finegan, Edward. 2000. Longman Grammar of Spoken and Written English. London: Longman.

Coleman, Robert. 1990. The assessment of lexical mortality and replacement between Old and Modern English. In Papers from the 5th International Conference on English Historical Linguistics [Current Issues in Linguistic Theory 65], Sylvia M. Adamson, Vivien A. Law, Nigel Vincent & Susan Wright (eds), 69–86. Amsterdam: John Benjamins.

.

Cvrček, Václav. n.d. Corpus Confidence Calculator. < [URL]> (27 April 2019).

Denison, David. 1998. Syntax. In The Cambridge History of the English Language, Vol. 4: 1776–1997, Suzanne Romaine (ed.). Cambridge: CUP.

Farradne, J., Poulton, R.K & Datta, M. S.. 1965. Problems in analysis and terminology for information retrieval. Journal of Documentation 21(4): 287–90.

.

Iyeiri, Yoko. 2018. Causative make and its infinitival complements in Early Modern English. In Explorations in English Historical Syntax [Studies in Langage Companion Series 198], Hubert Cuyckens, Hendrik De Smet, Liesbet Heyvaert & Charlotte Maekelberghe (eds), 139–58. Amsterdam: John Benjamins.

.

Kilgarriff, Adam. 2015. How many words are there? In The Oxford Handbook of the Word, John R. Taylor (ed.), 29–37. Oxford: OUP.

Maixner, Vítězslav. 1970. Zánik Slov v Nové Angličtině.

Michel, Jean-Baptiste, Kui Shen, Yuan, Presser Aiden, Aviva, Veres, Adrian, Gray, Matthew K., The Google Books Google Books Team, Pickett, Joseph P. et al.. 2011. Quantitative analysis of culture using millions of digitized books. Science 331(6014): 176–182.

.

Milton, James & Donzelli, Giovanna. 2013. The lexicon. In The Cambridge Handbook of Second Language Acquisition, Julia Herschensohn & Martha Young-Scholten (eds), 441–60. Cambridge: CUP.

Němec, Igor. 1968. Strukturní předpoklady zániku slov. Slovo a Slovesnost 29(2): 152–58. <[URL]> (5 November 2020).

Oxford English Dictionary. n.d. Key to frequency. Oxford: OUP. <[URL]> (22 April 2019).

Petersen, Alexander M., Tenenbaum, Joel, Havlin, Shlomo & Stanley, H. Eugene. 2012. Statistical laws governing fluctuations in word use from word birth to word death. Scientific Reports 2 (March): 313.

.

Rudnicka, Karolina. 2019. The statistics of obsolescence: Purpose subordinators in Late Modern English. Basel: NIHIN.

Rychlý, Pavel. 2008. A lexicographer-friendly association score. RASLAN 2008, 6–9. Brno: Masarykova Univerzita.

The British National Corpus, Version 2 (BNC World). 2001. Praha: Distributed by Oxford University Computing Services on behalf of the BNC Consortium. Ústav Českého národního korpusu FF UK. <[URL]>

Tichý, Ondřej. 2018. Lexical obsolescence and loss in English: 1700–2000. In Applications of Pattern-Driven Methods in Corpus Linguistics [Studies in Corpus Linguistics 82], Joanna Kopaczyk & Jukka Tyrkkö (eds), 81–103. Amsterdam: John Benjamins.

.

Trench, Richard Chenevix. 1871. English. Past and Present., New York, NY: Charles Scribner and Co.

Wahl, Alexander & Gries, Stefan T. 2019. Computational extraction of formulaic sequences from corpora: Two case studies of a new extraction algorithm. In Computational Phraseology [IVITRA Research in Linguistics and Literature 24], Gloria Corpas Pastor & Jean-/Pierre Colson (eds), 84–110. Amsterdam: John Benjamins.