This paper explores a new methodology for extracting forms that were once common but are now obsolete, from large corpora. It proceeds from the relatively under-researched problem of lexical mortality, or obsolescence in general, to the formulation of two closely related procedures for querying the n-gram data of the Google Books project in order to identify the best word and lexical expression candidates that may have become lost or obsolete in the course of the last three centuries, from the Late Modern era to Present-day English (1700–2000). After describing the techniques used to process big uni- and trigram data, this chapter offers a selective analysis of the results and proposes ways the methodology may be of help to corpus linguists as well as historical lexicographers.
2008Ælfric’s homilies and incipient typological change in the 12th century English word-formation. Acta Universitatis Philologica: Prague Studies in English XXV(1): 109–115.
Davies, Mark
2012Google Books corpus. Google Books Corpus. [URL]> (1February 2016).
1871English, Past and Present. New York NY: Charles Scribner and Company.
Cited by
Cited by 10 other publications
Cunha, Evandro L.T.P. & Søren Wichmann
2021. An algorithm to identify periods of establishment and obsolescence of linguistic items in a diachronic corpus. Corpora 16:2 ► pp. 205 ff.
Drury, Brett & Samuel Morais Drury
2022. Lexical Bundle Variation in Business Actors’ Public Communications. In Text, Speech, and Dialogue [Lecture Notes in Computer Science, 13502], ► pp. 339 ff.
Francis, David, Ella Rabinovich, Farhan Samir, David Mortensen & Suzanne Stevenson
2021. Quantifying Cognitive Factors in Lexical Decline. Transactions of the Association for Computational Linguistics 9 ► pp. 1529 ff.
This list is based on CrossRef data as of 22 march 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.