The 400 million word Corpus of Historical American English (1810–2009)

Davies, Mark

doi:10.1075/cilt.325.11dav

In:English Historical Linguistics 2010: Selected Papers from the Sixteenth International Conference on English Historical Linguistics (ICEHL 16), Pécs, 23-27 August 2010
Edited by Irén Hegedűs and Alexandra Fodor
[Current Issues in Linguistic Theory 325] 2012
► pp. 231–262

Get fulltext from our e-platform

Download Book PDF

The 400 million word Corpus of Historical American English (1810–2009)

Mark Davies | Brigham Young University

Published online: 13 November 2012

https://doi.org/10.1075/cilt.325.11dav

The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to decade. Because of its size and its advanced architecture and interface, it allows researchers to look at an extremely wide range of changes – many of which would not be possible with a small 2–4 million word corpus. These include the frequency of any word or phrase by decade and mass comparison of all words in different periods (to examine lexical changes), morphological shifts (via wildcards and pattern matching), syntactic shifts (due to very accurate lemmatization and part of speech tagging), and semantic change (by comparing collocates over time, as well as searches that use data from the integrated thesaurus and customized word lists).

Cited by (7)

Cited by seven other publications

Order by:

Ayoun, Dalila

2025. The Second Language Acquisition of English Tense, Aspect and Modality,

Kytö, Merja & Lucia Siebers

2022. Earlier North American Englishes. In Earlier North American Englishes [Varieties of English Around the World, G66], ► pp. 1 ff.

Seminck, Olga, Philippe Gambette, Dominique Legallois & Thierry Poibeau

2022. The Evolution of the Idiolect over the Lifetime: A Quantitative and Qualitative Study of French 19th Century Literature. Journal of Cultural Analytics 7:3

Flach, Susanne

2021. From movement into action to manner of causation: changes in argument mapping in the into-causative. Linguistics 59:1 ► pp. 247 ff.

Vartiainen, Turo & Mikko Höglund

2020. How to Make New Use of Existing Resources:. American Speech 95:4 ► pp. 408 ff.

Lin, Zefeng, Xiaojun Wan & Zongming Guo

2019. Learning Diachronic Word Embeddings with Iterative Stable Information Alignment. In Natural Language Processing and Chinese Computing [Lecture Notes in Computer Science, 11838], ► pp. 749 ff.

Anderwald, Lieselotte

2014. “Pained the eye and stunned the ear”. In Contact, Variation, and Change in the History of English [Studies in Language Companion Series, 159], ► pp. 113 ff.

This list is based on CrossRef data as of 15 november 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.