We investigate the usefulness of part-of-speech (POS) annotation as a tool in the study of sociolinguistic variation and genre evolution. We analyse how POS ratios change over time in the Parsed Corpus of Early English Correspondence (c.1410–1681), which social groups lead the changes, and whether the changes can be connected to colloquialisation with regard to reduced complexity or an increasingly involved style. While we find gentry-led colloquialisation in terms of noun and verb frequencies as well as evidence for gendered styles, the results on structural complexity are more mixed. We argue that POS annotation can be a useful tool when complemented by a thorough textual analysis, but that more fine-grained categories are needed to reach firmer conclusions.
Argamon, Shlomo, Moshe Koppel, Jonathan Fine & Anat Rachel Shimoni. 2003. Gender, genre, and writing style in formal written texts. Text 23(3). 321–346. DOI:
Atzmueller, Martin. 2015. Subgroup discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 5(1). 35–49. DOI:
Bamman, David, Jacob Eisenstein & Tyler Schnoebelen. 2014. Gender identity and lexical variation in social media. Journal of Sociolinguistics 18(2). 135–160. DOI:
Bell, Allan. 1984. Language style as audience design. Language in Society 13(2). 145–204. DOI:
Biber, Douglas. 1988. Variation across speech and writing. Cambridge: Cambridge University Press. DOI:
Biber, Douglas. 1992. On the complexity of discourse complexity: A multidimensional analysis. Discourse Processes 15(2). 133–163. DOI:
Biber, Douglas. 1995. Dimensions of register variation. Cambridge: Cambridge University Press. DOI:
Biber, Douglas & Jena Burges. 2000. Historical change in the language use of women and men: Gender differences in dramatic dialogue. Journal of English Linguistics 28(1). 21–37. DOI:
Biber, Douglas & Susan Conrad. 2009. Register, genre, and style (Cambridge Textbooks in Linguistics). Cambridge: Cambridge University Press. DOI:
Biber, Douglas & Edward Finegan. 1989. Drift and the evolution of English style: A history of three genres. Language 65(3). 487–517. DOI:
Biber, Douglas & Edward Finegan. 1997. Diachronic relations among speech-based and written registers in English. In Terttu Nevalainen & Leena Kahlas-Tarkka (eds.), To explain the present: Studies in the changing English language in honour of Matti Rissanen (Mémoires de la Société Néophilologique de Helsinki 52), 253–275. Helsinki: Société Néophilologique.
Biber, Douglas & Bethany Gray. 2010. Being specific about historical change: The influence of sub-register. Journal of English Linguistics 41(2). 104–134. DOI:
Biber, Douglas, Bethany Gray & Shelley Staples. 2016. Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics 37(5). 639–668. DOI:
Carpenter, Bob, Andrew Gelman, Matt Hoffman, Daniel Lee, Ben Goodrich, Michael Betancourt, Marcus Brubaker, Jiqiang Guo, Peter Li & Allen Riddell. 2017. Stan: A probabilistic programming language. Journal of Statistical Software 76(1). DOI:
Chafe, Wallace. 1982. Integration and involvement in speaking, writing, and oral literature. In Deborah Tannen (ed.), Spoken and written language, 35–53. Norwood, NJ: Ablex.
Halliday, M. A. K. & Ruqaiya Hasan. 1976. Cohesion in English. London & New York: Longman.
Heylighen, Francis & Jean-Marc Dewaele. 2002. Variation in the contextuality of language: An empirical measure. Foundations of Science 7(3). 293–340. DOI:
Hinneburg, Alexander, Heikki Mannila, Samuli Kaislaniemi, Terttu Nevalainen & Helena Raumolin-Brunberg. 2007. How to handle small samples: Bootstrap and Bayesian methods in the analysis of linguistic change. Literary and Linguistic Computing 22(2). 137–150. DOI:
Huddleston, Rodney & Geoffrey K. Pullum (eds.). 2002. The Cambridge grammar of the English language. Cambridge: Cambridge University Press. DOI:
Hudson, Richard. 1994. About 37% of word-tokens are nouns. Language 70(2). 331–339. DOI:
Karlsson, Fred. 2008. Complexity in linguistic theorizing. The Mental Lexicon 9(2). 144–169.
Labov, William. 1982. Building on empirical foundations. In Winfred P. Lehmann & Yakov Malkiel (eds.), Perspectives on historical linguistics: Papers from a conference held at the meeting of the Language Theory Division, Modern Language Assn, San Francisco, 27–30 December 1979 (Current Issues in Linguistic Theory 24), 17–92. Amsterdam: John Benjamins. DOI:
Labov, William. 1990. The intersection of sex and social class in the course of linguistic change. Language Variation and Change 2(2). 205–254. DOI:
Labov, William. 1994. Principles of linguistic change, volume 1: Internal factors. Oxford: Blackwell.
Laslett, Peter. 1965. The world we have lost. New York: Charles Scribner’s Sons.
Lehto, Anu. 2015. The genre of Early Modern English statutes: Complexity in historical legal language (Mémoires de la Société Néophilologique de Helsinki 97). Helsinki: Société Néophilologique.
Mäkelä, Eetu, Tanja Säily & Terttu Nevalainen. 2016. Khepri – a modular view-based tool for exploring (historical sociolinguistic) data. In Maciej Eder & Jan Rybicki (eds.), Digital Humanities 2016: Conference abstracts, 269–272. Kraków: Jagiellonian University & Pedagogical University.
Markus, Manfred. 2001. The development of prose in Early Modern English in view of the gender question: Using grammatical idiosyncracies of 15th and 17th century letters. European Journal of English Studies 5(2). 181–196. DOI:
Meurman-Solin, Anneli. 2011. Utterance-initial connective elements in early Scottish epistolary prose. In Anneli Meurman-Solin & Ursula Lenker (eds.), Connectives in synchrony and diachrony in European languages (Studies in Variation, Contacts and Change in English 8). Helsinki: VARIENG. [URL] (17 December, 2016.)
Nevala, Minna. 2004. Address in early English correspondence: Its forms and socio-pragmatic functions (Mémoires de la Société Néophilologique de Helsinki 64). Helsinki: Société Néophilologique.
Nevalainen, Terttu. 2002. Language and woman’s place in earlier English. Journal of English Linguistics 30(2). 181–199. DOI:
Nevalainen, Terttu & Helena Raumolin-Brunberg. 2003. Historical sociolinguistics: Language change in Tudor and Stuart England (Longman Linguistics Library). London: Pearson Education.
Newman, Matthew L., Carla J. Groom, Lori D. Handelman & James W. Pennebaker. 2008. Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes 45(3). 211–236. DOI:
Palander-Collin, Minna. 1999. Grammaticalization and social embedding: I THINK and METHINKS in Middle and Early Modern English (Mémoires de la Société Néophilologique de Helsinki 55). Helsinki: Société Néophilologique.
Palander-Collin, Minna. 2000. The language of husbands and wives in seventeenth-century correspondence. In Christian Mair & Marianne Hundt (eds.), Corpus linguistics and linguistics theory. Papers from the twentieth International Conference on English Language Research on Computerized Corpora (ICAME 20), Freiburg im Breisgau 1999 (Language and Computers: Studies in Practical Linguistics 33), 289–300. Amsterdam: Rodopi.
Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk & Terttu Nevalainen. Compiled by the CEEC Project TeamPCEEC = Parsed Corpus of Early English Correspondence, tagged version. 2006. Annotated by Arja Nurmi, Ann Taylor, Anthony Warner, Susan Pintzuk & Terttu Nevalainen. Compiled by the CEEC Project Team. York: University of York & Helsinki: University of Helsinki. Distributed through the Oxford Text Archive. [URL] (17 December, 2016.)
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech & Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman.
R Core Team. 2016. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. [URL] (17 December, 2016.)
Raumolin-Brunberg, Helena & Terttu Nevalainen. 2007. Historical sociolinguistics: The Corpus of Early English Correspondence. In Joan C. Beal, Karen P. Corrigan & Hermann L. Moisl (eds.), Creating and digitizing language corpora, volume 2: Diachronic databases, 148–171. Houndsmills: Palgrave Macmillan. DOI:
Rescher, Nicholas. 1998. Complexity: A philosophical overview. New Brunswick, NJ: Transaction Publishers.
Säily, Tanja, Terttu Nevalainen & Harri Siirtola. 2011. Variation in noun and pronoun frequencies in a sociohistorical corpus of English. Literary and Linguistic Computing 26(2). 167–188. DOI:
Santorini, Beatrice. 2016. Annotation manual for the Penn Historical Corpora and the York-Helsinki Corpus of Early English Correspondence. [URL] (17 December, 2016.)
Schiffrin, Deborah. 1987. Discourse markers. Cambridge: Cambridge University Press. DOI:
Siirtola, Harri, Poika Isokoski, Tanja Säily & Terttu Nevalainen. 2016. Interactive text visualization with Text Variation Explorer. In Ebad Banissi, Mark W. McK. Bannatyne, Fatma Bouali, Remo Burkhard, John Counsell, Urska Cvek, Martin J. Eppler, Georges Grinstein, Wei Dong Huang, Sebastian Kernbach, Chun-Cheng Lin, Feng Lin, Francis T. Marchese, Chi Man Pun, Muhammad Sarfraz, Marjan Trutschl, Anna Ursyn, Gilles Venturini, Theodor G. Wyeld & Jian J. Zhang (eds.), Proceedings of the 20th international conference on Information Visualisation (IV 2016), 330–335. Los Alamitos, California, CA: IEEE Computer Society. DOI:
Siirtola, Harri, Terttu Nevalainen, Tanja Säily & Kari-Jouko Räihä. 2011. Visualisation of text corpora: A case study of the PCEEC. In Terttu Nevalainen & Susan M. Fitzmaurice (eds.), How to deal with data: Problems and approaches to the investigation of the English language over time and space (Studies in Variation, Contacts and Change in English 7). Helsinki: VARIENG. [URL] (17 December, 2016.)
Tannen, Deborah. 1991. You just don’t understand: Women and men in conversation. New York: Morrow and Company.
Taylor, Ann. 2007. The York-Toronto-Helsinki Parsed Corpus of Old English Prose. In Joan C. Beal, Karen P. Corrigan & Hermann L. Moisl (eds.), Creating and digitizing language corpora, volume 2: Diachronic databases, 196–227. Houndsmills: Palgrave Macmillan. DOI:
Taylor, Ann & Beatrice Santorini. 2006. The Parsed Corpus of Early English Correspondence. University of York. [URL] (17 December, 2016.)
Vartiainen, Turo, Tanja Säily & Mikko Hakala. 2013. Variation in pronoun frequencies in early English letters: Gender-based or relationship-based? In Jukka Tyrkkö, Olga Timofeeva & Maria Salenius (eds.), Ex philologia lux: Essays in honour of Leena Kahlas-Tarkka (Mémoires de la Société Néophilologique de Helsinki 90), 233–255. Helsinki: Société Néophilologique.
2021. The burden of legacy: Producing the Tagged Corpus of Early English Correspondence Extension (TCEECE). Research in Corpus Linguistics 9:1 ► pp. 104 ff.
Leiwo, Martti
2020. L2 Greek in Roman Egypt: Intense language contact in Roman military forts. Journal of Historical Sociolinguistics 6:2
This list is based on CrossRef data as of 10 january 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.