Mark Davies

List of John Benjamins publications for which Mark Davies plays a role.


Davies, Mark 2021 The Coronavirus Corpus: Design, construction, and useLanguage and Covid-19, Mahlberg, Michaela and Gavin Brookes (eds.), pp. 583–598 | Article
This paper discusses the creation and use of the Coronavirus Corpus, which is currently (March 2021) 900 million words in size, and which will probably be about one billion words in size by May–June 2021. The Coronavirus Corpus is a subset of the NOW Corpus (News on the Web), which is currently… read more
Davies, Mark 2021 The TV and Movies corpora: Design, construction, and useCorpus approaches to telecinematic language, Bednarek, Monika, Valentin Werner and Marcia Veirano Pinto (eds.), pp. 10–37 | Article
This paper discusses the creation and use of the TV Corpus (subtitles from 75,000 episodes, 325 million words, 6 English-speaking countries, 1950s-2010s) and the Movies Corpus (subtitles from 25,000 movies, 200 million words, 6 English-speaking countries, 1930s–2010s), which are available at… read more
Davies, Mark and Jong-Bok Kim 2018 Chapter 6. Semantic and lexical shifts with the “into-causative” construction in American EnglishExplorations in English Historical Syntax, Cuyckens, Hubert, Hendrik De Smet, Liesbet Heyvaert and Charlotte Maekelberghe (eds.), pp. 159–178 | Chapter
In this paper, we consider several lexical and semantic shifts with the “into-causative” construction (e.g. Sue talked them into leaving) in American English since the early 1800s. The study is based on more than 11,000 tokens (including 680 different matrix verbs) in several large corpora,… read more
Davies, Mark and Robert Fuchs 2015 A replyEnglish World-Wide 36:1, pp. 45–47 | Commentary
A reply to the commentaries by Christian Mair (DOI:10.1075/eww.36.1.02mai), Joybrato Mukherjee (DOI:10.1075/eww.36.1.02muk), Gerald Nelson (DOI:10.1075/eww.36.1.02nel), and Pam Peters (DOI:10.1075/eww.36.1.02pet). read more
In this paper, we provide an overview of the new GloWbE Corpus — the Corpus of Global Web-based English. GloWbE is based on 1.9 billion words in 1.8 million web pages from 20 different English-speaking countries. Approximately 60 percent of the corpus comes from informal blogs, and the rest from a… read more
The “standard” Google Books n-grams were released by Google in 2010, and they include more than 155 billion words of data for the American English data alone. Unfortunately, the standard interface is far too simplistic to allow many types of useful research on this massive dataset. In this paper,… read more
The 400 million word Corpus of Historical American English (1810–2009) provides researchers with an extremely robust set of data for Late Modern English. The corpus is composed of fiction, magazines, newspapers, and nonfiction books, and its genre balance stays roughly the same from decade to… read more
Davies, Mark 2011 Synchronic and diachronic uses of corporaPerspectives on Corpus Linguistics, Viana, Vander, Sonia Zyngier and Geoff Barnbrook (eds.), pp. 63–80 | Article
In this interview, Mark Davies, Professor of (Corpus) Linguistics at Brigham Young University (United States), shows his interest in languages such as English, Spanish and Portuguese. This interest is revealed in his involvement with corpora compilation (Corpus of Historical American English,… read more
Davies, Mark 2010 More than a peephole: Using large and diverse online corporaThe Bootcamp Discourse and Beyond, Worlock Pope, Caty (ed.), pp. 412–418 | Article
The Corpus of Contemporary American English (COCA), which was released online in early 2008, is the first large and diverse corpus of American English. In this paper, we first discuss the design of the corpus — which contains more than 385 million words from 1990–2008 (20 million words each year),… read more
Relational databases can be used to create large corpora that provide both very good search performance and a wide range of queries. This paper outlines how this approach has been used to create theCorpus del Español, which contains 100 million words of text in Spanish texts from the 1200s-1900s.… read more
Davies, Mark 2004 Student use of large, annotated corpora to analyze syntactic variationCorpora and Language Learners, Aston, Guy, Silvia Bernardini and Dominic Stewart (eds.), pp. 259–269 | Article
Davies, Mark 2000 Syntactic Diffusion in Spanish and Portuguese Infinitival ComplementsNew Approaches to Old Problems: Issues in Romance historical linguistics, Dworkin, Steven N. and Dieter Wanner (eds.), pp. 109 ff. | Chapter
This study is the first comprehensive, data-based examination of subject raising in Portuguese, and is based on 4500+ tokens in more than 26,500,000 words of text from both the written and spoken registers of Brazilian and European Portuguese. We have suggested that there are important differences… read more