Mark Davies | John Benjamins

Davies, Mark 2021 The Coronavirus Corpus: Design, construction, and use Language and Covid-19, Mahlberg, Michaela and Gavin Brookes (eds.), pp. 583–598 | Article

This paper discusses the creation and use of the Coronavirus Corpus, which is currently (March 2021) 900 million words in size, and which will probably be about one billion words in size by May–June 2021. The Coronavirus Corpus is a subset of the NOW Corpus (News on the Web), which is currently… read more

Davies, Mark 2021 The TV and Movies corpora: Design, construction, and use Corpus approaches to telecinematic language, Bednarek, Monika, Valentin Werner and Marcia Veirano Pinto (eds.), pp. 10–37 | Article

This paper discusses the creation and use of the TV Corpus (subtitles from 75,000 episodes, 325 million words, 6 English-speaking countries, 1950s-2010s) and the Movies Corpus (subtitles from 25,000 movies, 200 million words, 6 English-speaking countries, 1930s–2010s), which are available at… read more

Davies, Mark and Jong-Bok Kim 2018 Chapter 6. Semantic and lexical shifts with the “into-causative” construction in American English Explorations in English Historical Syntax, Cuyckens, Hubert, Hendrik De Smet, Liesbet Heyvaert and Charlotte Maekelberghe (eds.), pp. 159–178 | Chapter

In this paper, we consider several lexical and semantic shifts with the “into-causative” construction (e.g. Sue talked them into leaving) in American English since the early 1800s. The study is based on more than 11,000 tokens (including 680 different matrix verbs) in several large corpora,… read more

Davies, Mark and Robert Fuchs 2015 A reply English World-Wide 36:1, pp. 45–47 | Commentary

A reply to the commentaries by Christian Mair (DOI:10.1075/eww.36.1.02mai), Joybrato Mukherjee (DOI:10.1075/eww.36.1.02muk), Gerald Nelson (DOI:10.1075/eww.36.1.02nel), and Pam Peters (DOI:10.1075/eww.36.1.02pet). read more

Davies, Mark and Robert Fuchs 2015 Expanding horizons in the study of World Englishes with the 1.9 billion word Global Web-based English Corpus (GloWbE) English World-Wide 36:1, pp. 1–28 | Article

In this paper, we provide an overview of the new GloWbE Corpus — the Corpus of Global Web-based English. GloWbE is based on 1.9 billion words in 1.8 million web pages from 20 different English-speaking countries. Approximately 60 percent of the corpus comes from informal blogs, and the rest from a… read more

Davies, Mark 2014 Making Google Books n-grams useful for a wide range of research on language change International Journal of Corpus Linguistics 19:3, pp. 401–416 | Article

The “standard” Google Books n-grams were released by Google in 2010, and they include more than 155 billion words of data for the American English data alone. Unfortunately, the standard interface is far too simplistic to allow many types of useful research on this massive dataset. In this paper,… read more

Davies, Mark 2012 The 400 million word Corpus of Historical American English (1810–2009) English Historical Linguistics 2010: Selected Papers from the Sixteenth International Conference on English Historical Linguistics (ICEHL 16), Pécs, 23-27 August 2010, Hegedűs, Irén and Alexandra Fodor (eds.), pp. 231–262 | Article

The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to… read more

Davies, Mark 2011 Synchronic and diachronic uses of corpora Perspectives on Corpus Linguistics, Viana, Vander, Sonia Zyngier and Geoff Barnbrook (eds.), pp. 63–80 | Article

In this interview, Mark Davies, Professor of (Corpus) Linguistics at Brigham Young University (United States), shows his interest in languages such as English, Spanish and Portuguese. This interest is revealed in his involvement with corpora compilation (Corpus of Historical American English,… read more

Davies, Mark 2010 More than a peephole: Using large and diverse online corpora The Bootcamp Discourse and Beyond, Worlock Pope, Caty (ed.), pp. 412–418 | Article

Davies, Mark 2009 The 385+ million word Corpus of Contemporary American English (1990–2008+): Design, architecture, and linguistic insights International Journal of Corpus Linguistics 14:2, pp. 159–190 | Article

The Corpus of Contemporary American English (COCA), which was released online in early 2008, is the first large and diverse corpus of American English. In this paper, we first discuss the design of the corpus — which contains more than 385 million words from 1990–2008 (20 million words each year),… read more

Davies, Mark 2005 The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation International Journal of Corpus Linguistics 10:3, pp. 307–334 | Article

Relational databases can be used to create large corpora that provide both very good search performance and a wide range of queries. This paper outlines how this approach has been used to create theCorpus del Español, which contains 100 million words of text in Spanish texts from the 1200s-1900s.… read more

Davies, Mark 2004 Student use of large, annotated corpora to analyze syntactic variation Corpora and Language Learners, Aston, Guy, Silvia Bernardini and Dominic Stewart (eds.), pp. 259–269 | Article

Davies, Mark 2000 Syntactic Diffusion in Spanish and Portuguese Infinitival Complements New Approaches to Old Problems: Issues in Romance historical linguistics, Dworkin, Steven N. and Dieter Wanner (eds.), pp. 109 ff. | Chapter

Davies, Mark 1997 A Computer Corpus-Based Study of Subject Raising in Modern Portuguese Lingvisticæ Investigationes 21:2, pp. 379–400 | Article

This study is the first comprehensive, data-based examination of subject raising in Portuguese, and is based on 4500+ tokens in more than 26,500,000 words of text from both the written and spoken registers of Brazilian and European Portuguese. We have suggested that there are important differences… read more

Davies, Mark 1995 The evolution of causative constructions in Spanish and Portuguese Contemporary Research in Romance Linguistics: Papers from the XXII Linguistic Symposium on Romance Languages, El Paso/Juárez, February 22–24, 1992, Amastae, Jon, Grant Goodall, M. Montalbetti and M. Phinney (eds.), pp. 105 ff. | Article