The use of corpora in semantic research is a rapidly developing method. However, the range of quantitative techniques employed in the field can make it difficult for the non-specialist to keep abreast with the methodological development. This chapter serves as an introduction to the use of corpus methods in Cognitive Semantic research and as an overview of the relevant statistical techniques and software needed for performing them. The discussion and description are intended for researches in semantics that are interested in adopting quantitative corpus-driven methods. The discussion argues that there are fundamentally two corpus-driven approaches to meaning, one based on observable formal patterns (collocation analysis) and another based on patterns of annotated usage-features of use (feature analysis). The discussion then introduces and explains each of the statistical techniques currently used in the field. Examples of the use of each technique are listed and a summary of the software packages available in R for performing the techniques is included.
Adler, J. (2010). R in a nutshell:A desktop quick reference. Sebastopol: O’Reilly Media.
Afifi, A., May S., & Clark, V.A. (2011). Practical multivariate analysis (5th ed.). London: Chapman & Hall.
Agresti, A. (2007). An introduction to categorical data analysis (2nd ed.). Hoboken: John Wiley.
Agresti, A. (2010). Analysis of ordinal categorical data (2nd ed.). Hoboken: John Wiley.
Agresti, A. (2013) [1990, 2002]. Categorical data analysis (3rd ed.). New York: John Wiley.
Arppe, A. (2006). Frequency considerations in morphology: Finnish verbs differ, too. SKY Journal of Linguistics, 19, 175–189.
Arppe, A. (2008). Univariate, bivariate and multivariate methods in corpus-based lexicography – A study of synonymy. Unpublished PhD dissertation, University of Helsinki.
Azen, R., & Walker, C. (2011). Categorical data analysis for the behavioral and social sciences. New York & Hove: Routledge.
Baayen, R.H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press.
Baguley, T. (2012). Loglinear models. Online Supplement 5 to Serious stats: A guide to advanced statistics for the behavioral sciences. Basingstoke: Palgrave. Available at: [URL].
Balahur, A., & Montoyo, A. (2012). Semantic approaches to fine and coarse-grained feature-based opinion mining. In H. Horacek, E. Métais, R. Muñoz, & M. Wolska (Eds.), Natural language processing and information systems (pp. 142–153). Berlin: Springer.
Barnabé, A. (2012). Le schème du chemin en grammaire et sémantique anglaises. Unpublished PhD dissertation, Université Bordeaux 3.
Bates, D. (Forthcoming). lme4: Mixed-effects modeling with R. Heidelberg & New York: Springer. Preprints available at: [URL].
Benzécri, J.-P. (1980). Pratique de l’analyse des donnees. Paris: Dunod.
Benzécri, J.-P. (1992). Correspondence analysis handbook. New York: Dekker.
Berthele, R. (2010). Investigations into the folk’s mental models of linguistic varieties. In D. Geeraerts, G. Kristiansen, & Y. Peirsman (Eds.), Advances in cognitive sociolinguistics (pp. 265–290). Berlin & New York: Mouton de Gruyter.
Biber, D., & Jones, J. (2009). Quantitative methods in Corpus Linguistics. In A. Lüdeling, & M. Kytö (Eds.), Corpus Linguistics: An international handbook. Vol. 2. (pp. 1287–1304). Berlin & New York: Mouton de Gruyter.
Borg, I., Groenen, & Mair, P. (2013). Applied multidimensional scaling. Heidleberg & New York: Springer.
Borg, I., & Groenen, P. (2005). Modern multidimensional scaling (2nd ed.). Heidelberg & New York: Springer.
Bresnan, J., Cueni, A., Nikitina, T., & Baayen, H. (2007). Predicting the dative. In G. Bouma, I. Krämer, & J. Zwarts (Eds.), Cognitive foundations of interpretation alternation (pp. 69–94). Amsterdam: Royal Netherlands Academy of Arts and Sciences.
Bybee, J., & Eddington, D. (2006). A usage-based approach to Spanish verbs of ‘becoming’. Language, 82, 323–355.
Cadoret, M., Lê, S., & Pagès, J. (2011). Multidimensional scaling versus multiple correspondence analysis when analyzing categorization data. In B. Fichet, D. Piccolo, R. Verde, & M. Vichi (Eds.), Classification and multivariate analysis for complex data structures (pp. 301–308). Heidleberg & New York: Springer.
Chaffin, R. (1992). The concept of a semantic relation. In A. Lehrer, & E. Kittay (Eds.), Frames, fields, and contrasts: New essays in semantic and lexical organisation (pp. 253–288). London: Lawrence Erlbaum.
Chatterjee, S., & Hadi, A. (2006). Regression analysis by example. London: John Wiley.
Chessel, D., & Dufour, A.-B. (2013). Analysis of ecological data: Exploratory and Euclidean methods in environmental sciences. Available at: [URL].
Chessel, D., Dufour A.-B, & Thioulouse, Y. (2004) The ade4 package – I: One-table methods. R News, 4, 5–10.
Christensen, R. (1997). Log-linear models and logistic regression (2nd ed.). Heidleberg & New York: Springer.
Christensen, R. (2012). A tutorial on fitting cumulative link models with the ordinal package. Available at: [URL].
Clancy, S. (2006). The topology of Slavic case: Semantic maps and multidimensional scaling. Glossos, 7, 1–28.
Colleman, T. (2010). Beyond the dative alternation: The semantics of the Dutch aan-Dative. In D. Glynn, & K. Fischer (Eds.), Quantitative Cognitive Semantics: Corpus-driven approaches (pp. 271–304). Berlin & New York: Mouton de Gruyter.
Crawley, M. (2005). Statistics: An introduction using R. Southern Gate & Hoboken: John Wiley.
Crawley, M. (2007). The R book. Chichester: John Wiley.
Croft, W., & Poole, K. (2008). Inferring universals from grammatical variation: Multidimensional scaling for typological analysis. Theoretical Linguistics, 34, 1–37.
Croissant, Y. (2013). Estimation of multinomial logit models in R: The mlogit packages. Available at: [URL].
Daille, B., Dubreil, E.Monceaux, L., & Vernier, M. (2011). Annotating opinion–evaluation of blogs: The Blogoscopy corpus. Language Resources and Evaluation, 45, 409–437.
Dalgaard, P. (2008). Introductory statistics with R (2nd ed.). Dordrecht: Springer.
De Cock, B. (2014b). The discursive effects of Spanish impersonals uno and se. In D. Glynn, & M. Sjölin (Eds.), Subjectivity and epistemicity: Corpus, discourse, and literary approaches to stance (pp. 103–120). Lund: Lund University Press.
De Leeuw, J., & Mair, P. (2009a). Simple and canonical correspondence analysis using the R package anacor. Journal of Statistical Software, 31, 1–18.
De Leeuw, J., & Mair, P. (2009b). Multidimensional scaling using majorization: The R package smacof. Journal of Statistical Software, 31, 1–30.
De Leeuw, J., & Mair, P. (2013a). anacor: Simple and canonical correspondence analysis. Available at: [URL].
De Leeuw, J., & Mair, M. (2013b). SMACOF for multidimensional scaling. Available at: [URL].
Delorge, M. (2009). A diachronic corpus study of the constructional behaviours of reception verbs in Dutch. In B. Lewandowska-Tomaszczyk, & K. Dziwirek (Eds.), Studies in Cognitive Corpus Linguistics (pp. 249–272). Frankfurt/Main: Peter Lang.
Desagulier, G. (In press). Le statut de la fréquence dans les Grammaires de Constructions: ‘simple comme bonjour’?Langages.
Desagulier, G. (Submitted). Quite new methods for a rather old issue: Exploring and visualizing collocation data from the BNC with correspondence analysis.
Deshors, S. (2011). A multifactorial study of the uses of may and can in French-English interlanguage. Unpublished PhD dissertation, University of Sussex.
Deshors, S. (2014). Identifying different types of non-native co-occurrence patterns: A corpus-based approach. In D. Glynn, & M. Sjölin (Eds.), Subjectivity and epistemicity: Corpus, discourse, and literary approaches to stance (pp. 387–412). Lund: Lund University Press.
Diehl, H. (2014). On modal meaning in the uses of quite, rather, pretty and fairly as degree modifiers in British English. Unpublished PhD dissertation, Lund University.
Divjak, D. (2006). Ways of intending: A corpus-based Cognitive Linguistic approach to near-synonyms in Russian. In St. Th. Gries, & A. Stefanowitsch (Eds.), Corpora in Cognitive Linguistics: Corpus-based approaches to syntax and lexis (pp. 19–56). Berlin & New York: Mouton de Gruyter.
Divjak, D. (2010a). Structuring the lexicon: A clustered model for near-synonymy. Berlin & New York: Mouton de Gruyter.
Divjak, D. (2010b). Corpus-based evidence for an idiosyncratic aspect-modality relation in Russian. In D. Glynn, & K. Fischer (Eds.), Quantitative Cognitive Semantics: Corpus-driven approaches (pp. 305–331). Berlin & New York: Mouton de Gruyter.
Divjak, D., & Gries, St. Th. (2006). Ways of trying in Russian: Clustering behavioral profiles. Corpus Linguistics and Linguistic Theory, 2, 23–60.
Divjak, D., & Gries, St. Th. (2009). Corpus-based Cognitive Semantics: A contrastive study of phrasal verbs in English and Russian. In B. Lewandowska-Tomaszczyk, & K. Dziwirek (Eds.), Studies in Cognitive Corpus Linguistics (pp. 273–296). Frankfurt/Main: Peter Lang.
Divjak, D., & Gries, St. Th. (Eds.). (2012). Frequency effects in language learning and processing. Berlin & New York: Mouton de Gruyter.
Drenan, R. (2009). Statistics for archaeologists: A common sense approach (2nd ed.). Heidelberg & New York: Springer.
Dziwirek, K., & Lewandowska-Tomaszczyk, B. (2011). Complex emotions and grammatical mismatches: A contrastive corpus-based study. Berlin & New York: Mouton de Gruyter.
Edwards, D. (2000). Introduction to graphical modelling (2nd ed.). Heidelberg: Springer.
Everitt, B.S. (2005). An R and S-PLUS companion to multivariate analysis. London: Springer.
Everitt, B.S., & Hothorn, I. (2010). A handbook of statistical analyses using R (2nd ed.). Boca Raton: Taylor & Francis.
Everitt, B.S., & Hothorn, I. (2011). An introduction to applied multivariate analysis with R. Munich: Springer.
Everitt, B.S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). Chichester: John Wiley.
Evert, S. (2009). Corpora and collocations. In A. Lüdeling, & M. Kytö (Eds.), Corpus Linguistics: An international handbook (pp. 1212–1249). Berlin & New York: Mouton de Gruyter.
Faraway, J. (2002). Practical regression and anova using R. Available at: [URL].
Faraway, J. (2006). Extending the linear model with R: Generalized linear, mixed effects and nonparametric regression models. London: Taylor & Francis.
Field, A., Miles, J., & Field, Z. (2012). Discovering statistics using R. London & Thousand Oaks: Sage.
Fillmore, C., & Atkins, B. (1992). Toward a frame-based lexicon: The semantics of risk and its neighbours. In A. Lehrer, & E. Kittay (Eds.), Frames, fields, and contrasts: New essays in semantic and lexical organisation (pp. 75–102). London: Lawrence Erlbaum.
Firth, J.R. (1957). A synopsis of linguistic theory 1930–1955. In J.R. Firth (Ed.), Studies in linguistic analysis (pp. 1–32). Oxford: Basil Blackwell.
Fischer, K. (2000). From Cognitive Semantics to Lexical Pragmatics: The functional polysemy of discourse particles. Berlin & New York: Mouton de Gruyter.
Fontaine, J., Scherer, K., & Soriano, C. (Eds.). (2013). Components of emotional meaning: A sourcebook. Oxford: Oxford University Press.
Funke, S., Mair, P., & von Eye, A. (2007). cfa: R package for the analysis of configuration frequencies. Available at: [URL].
Geeraerts, D. (2010). The doctor and the semantician. In D. Glynn, & K. Fischer (Eds.), Quantitative Cognitive Semantics: Corpus-driven approaches (pp. 63–78). Berlin & New York: Mouton de Gruyter.
Geeraerts, D. (2011). Entrenchment, conventionalization, and empirical method. Presented at the 44th Meeting of the Societas Linguistica Europaea, Logroño.
Geeraerts, D., Grondelaers, S., & Bakema, P. (1994). The structure of lexical variation: Meaning, naming, and context. Berlin & New York: Mouton de Gruyter.
Geeraerts, D., Grondelaers, S., & Speelman, D. (1999). Convergentie en Divergentie in de Nederlandse Woordenschat. Amsterdam: Meertens Instituut.
Geeraerts, D., Kristiansen, G., & Peirsman, Y. (Eds.). (2010). Advances in cognitive sociolinguistics. Berlin & New York: Mouton de Gruyter.
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge: Cambridge University Press.
Glynn, D. (2010a). Synonymy, lexical fields, and grammatical constructions: A study in usage-based Cognitive Semantics. In H.-J. Schmid, & S. Handl (Eds.), Cognitive foundations of linguistic usage-patterns: Empirical studies (pp. 89–118). Berlin & New York: Mouton de Gruyter.
Glynn, D. (2010b). Testing the hypothesis: Objectivity and verification in usage-based Cognitive Semantics. In D. Glynn, & K. Fischer (Eds.), Quantitative Cognitive Semantics: Corpus-driven approaches (pp. 239–270). Berlin & New York: Mouton de Gruyter.
Glynn, D. (2014a). The conceptual profile of the lexeme home: A multifactorial diachronic analysis. In J. E. Díaz-Vera (Ed.), Metaphor and metonymy across time and cultures (pp. 265–293). Berlin & New York: Mouton de Gruyter.
Glynn, D. (2014b). The social nature of anger: Multivariate corpus evidence for context effects upon conceptual structure. In I. Novakova, P. Blumenthal, & D. Siepmann (Eds.), Emotions in discourse (pp. 69–82). Frankfurt/Main: Peter Lang.
Glynn, D. (Forthcoming). Mapping meaning: Corpus methods for Cognitive Semantics. Cambridge: Cambridge University Press.
Glynn, D., & Sjölin, M. (2011). Cognitive Linguistic methods for literature: A usage-based approach to metanarrative and metalepsis. In A. Kwiatkowska (Ed.), Texts and minds: Papers in cognitive poetics and rhetoric (pp. 85–102). Frankfurt/Main: Peter Lang.
Glynn, D., & Krawczak, K. (Forthcoming). Social cognition, Cognitive Grammar and corpora: A multifactorial approach to epistemic modality. Cognitive Linguistics.
Glynn, D., & Fischer, D. (Eds.). (2010). Quantitative Cognitive Semantics: Corpus-driven approaches. Berlin & New York: Mouton de Gruyter.
Glynn, D., & Sjölin, M. (Eds.). (2014). Subjectivity and epistemicity: Corpus, discourse, and literary approaches to stance. Lund: Lund University Press.
Greenacre, M. (2007) [1993]. Correspondence analysis in practice (2nd ed.). London: Chapman & Hall.
Greenacre, M. (2010). Biplots in practice. Bilbao: Fundación BBVA.
Gries, St. Th. (1999). Particle movement: A cognitive and functional approach. Cognitive Linguistics, 10, 105–145.
Gries, St. Th. (2000). Towards multifactorial analyses of syntactic variation: The case of particle placement. Doctoral dissertation, University of Hamburg.
Gries, St. Th. (2003). Multifactorial analysis in Corpus Linguistics: A study of particle placement. London: Continuum Press.
Gries, St. Th. (2006). Corpus-based methods and Cognitive Semantics: The many senses of to run. In St. Th. Gries, & A. Stefanowitsch (Eds.), Corpora in Cognitive Linguistics: Corpus-based approaches to syntax and lexis (pp. 57–99). Berlin & New York: Mouton de Gruyter.
Gries, St. Th. (2009a). Quantitative Corpus Linguistics with R: A practical introduction. London: Routledge.
Gries, St. Th. (2009b). Statistics for Linguistics with R: A practical introduction (1st ed.). Berlin & New York: Mouton de Gruyter.
Gries, St. Th., & Stefanowitsch, A. (2004b). Co-varying collexemes in the into-causative. In M. Achard, & S. Kemmer (Eds.), Language, culture, and mind (pp. 225–36). Stanford: CSLI.
Gries, St. Th., & Divjak, D. (Eds.). (2012). Frequency effects in language representation. Berlin & New York: Mouton de Gruyter.
Gries, St. Th., & Stefanowitsch, A. (Eds.). (2006). Corpora in Cognitive Linguistics: Corpus-based approaches to syntax and lexis. Berlin & New York: Mouton de Gruyter.
Grondelaers, S. (2000). De distributie van niet-anaforisch er buiten de eerste zinsplaats: Sociolexicologische, functionele en psycholinguïstische aspecten van er’s status als presentatief signaal. Doctoral dissertation, University of Leuven.
Grondelaers S., Geeraerts, D., & Speelman, D. (2007). A case for a cognitive Corpus Linguistics. In M. Gonzalez-Marquez, I. Mittleberg, S. Coulson, & M. Spivey (Eds.), Methods in Cognitive Linguistics (pp. 149–169). Amsterdam & Philadelphia: John Benjamins.
Grondelaers S., Speelman, D., & Geeraerts, D. (2008). National variation in the use of er “there”: Regional and diachronic constraints on cognitive explanations. In G. Kristiansen, & R. Dirven (Eds.), Cognitive Sociolinguistics: Language variation, cultural models, social systems (pp. 153–204). Berlin & New York: Mouton de Gruyter.
Hadfield, J. (2010). MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. Journal of Statistical Software, 33, 1–22.
Härdle, W., & Simar, L. (2007). Applied multivariate statistical analysis. Heidelberg & New York: Springer.
Harrell, F. (2001). Regression modeling strategies: With Applications to linear models, logistic regression, and survival analysis. Heidelberg & New York: Springer.
Harrell, F. (2012). Regression modeling strategies. Unpublished manuscript, available at: [URL].
Hennig, C. (2013). Flexible procedures for clustering. Available at: [URL].
Heylen, K. (2005a). A quantitative corpus study of German word order variation. In St. Kepser, & M. Reis (Eds.), Linguistic evidence: Empirical, theoretical and computational perspectives(pp.241–264). Berlin & New York: Mouton de Gruyter.
Heylen, K. (2005b). Zur Abfolge (pro)nominaler Satzglieder im Deutschen: Eine korpusbasierte Analyse der relativen Abfolge von nominalem Subjekt und pronominalem Objekt im Mittelfeld, 264. Doctoral dissertation, University of Leuven.
Heylen, K., & Ruette, T. (2013). Degrees of semantic control in measuring aggregated lexical distances. In L. Borin, A. Saxena, A., & T. Rama (Eds.), Approaches to measuring linguistic differences (pp. 353–374). Berlin & New York: Mouton de Gruyter.
Heylen, K., Tummers, J., & Geeraerts, D. (2008). Methodological issues in corpus-based Cognitive Linguistics. In G. Kristiansen, & R. Dirven (Eds.), Cognitive Sociolinguistics: Language variation, cultural models, social systems (pp. 91–128). Berlin & New York: Mouton de Gruyter.
Hilbe, J. (2009). Logistic regression models. London: Chapman & Hall.
Hilbe, J. (2011) [2007]. Negative binomial regression (2nd ed.). Cambridge: Cambridge University Press.
Hilpert, M. (2012). Constructional change in English: Developments in allomorphy, word formation, and syntax. Cambridge: Cambridge University.
Hoffmann, Th. (2011). Preposition placement in English: A usage-based approach. Cambridge: Cambridge University Press.
Hosmer, D., & Lemeshow, S. (2013) [1989, 2000]. Applied logistic regression. Hoboken: John Wiley.
Hox, J. (2010). Multilevel analysis: Techniques and applications (2nd ed.). Hove & New York: Routledge.
Husson, F.Josse, J., Lê, S., & Mazet, J. (2013). Multivariate exploratory data analysis and data mining with R. Available at: [URL].
Husson, F., Lê, S., & Pagès, J. (2011). Exploratory multivariate analysis by example using R. London: Chapman & Hall.
Izenman, A. (2008). Modernmultivariate statistical techniques: Regression, classification and manifold learning. Heidelberg & New York: Springer.
Janda, L., & Solovyev, V. (2009). What constructional profiles reveal about synonymy: A case study of the Russian words for sadness and happiness. Cognitive Linguistics, 20, 367–393.
Johnson, K. (2008). Quantitative methods in linguistics. Oxford: Blackwell.
Johnson, V., & Albert, J. (1999). Ordinal data modeling. Heidelberg & New York: Springer.
Kaufman, L., & Rousseeuw, P. (2005) [1990]. Finding groups in data: An introduction to cluster analysis. Hoboken: John Wiley.
Keen, K. (2010). Graphics for statistics and data analysis with R. Boca Raton: CRC Press.
Klavan, J. (2012). Evidence in linguistics: Corpus-linguistic and experimental methods for studying grammatical synonymy. Doctoral Dissertation, University of Tartu.
Krawczak, K. (2014a). Shame and its near-synonyms in English: A multivariate corpus-driven approach to social emotions. In I. Novakova, P. Blumenthal, & D. Siepmann (Eds.), Emotions in discourse (pp. 84–94). Frankfurt/Main: Peter Lang.
Krawczak, K. (2014b). Epistemic stance predicates in English: A quantitative corpus-driven study of subjectivity. In D. Glynn, & M. Sjölin (Eds.), Subjectivity and epistemicity: Corpus, discourse, and literary approaches to stance (pp. 355–386). Lund: Lund University Press.
Krawczak, K. (In press). Corpus evidence for the cross-cultural structure of social emotions: Shame, embarrassment, and guilt in English and Polish. Poznań Studies in Contemporary Linguistics.
Krawczak, K., & Glynn, D. (2011). Context and cognition: A corpus-driven approach to parenthetical uses of mental predicates. In K. Kosecki, & J. Badio (Eds.), Cognitive processes in language (pp. 87–99). Frankfurt/Main: Peter Lang.
Krawczak, K., & Kokorniak, I. (2012). Corpus-driven quantitative approach to the construal of Polish ‘think’. Poznań Studies in Contemporary Linguistics, 48, 439–472.
Krawczak, K., & Glynn, D. (In press). Operationalising construal: Of/about prepositional profiling for cognitive and communicative predicates. In C.M. Bretones Callejas (Ed.), Construals in language and thought: What shapes what? Amsterdam: John Benjamins.
Kroonenberg, P. (2008). Applied multiway data analysis. New York: John Wiley.
Lê, S., Josse, J., & Husson, F. (2008). FactoMineR: An R package for multivariate analysis. Journal of Statistical Software, 25, 1–18.
Le Roux, B., & Rouanet, H. (2004). Geometric data analysis: From correspondence analysis to structured data analysis. Dordrecht: Kluwer.
Le Roux, B., & Rouanet, H. (2010). Multiple correspondence analysis. London & Thousand Oaks: Sage.
Ledolter, J. (2013). Data mining and business analytics with R. Hoboken: John Wiley.
Lesnoff, M., & Lancelot, R. (2013). Analysis of overdispersed data. Available at: [URL].
Levshina, N. (2011). A usage-based study of Dutch causative constructions. Doctoral dissertation, University of Leuven.
Levshina, N., Geeraerts, D., & Speelman, D. (2013a). Towards a 3D-grammar: Interaction of linguistic and extralinguistic factors in the use of Dutch causative constructions. Journal of Pragmatics, 52, 34–48.
Levshina, N., Geeraerts, D., & Speelman, D. (2013b). Mapping constructional spaces: A contrastive analysis of English and Dutch analytic causatives. Linguistics, 51, 825–854.
Lewandowska-Tomaszczyk, B., & Dziwirek, K. (Eds.). (2009). Studies in Cognitive Corpus Linguistics. Frankfurt/Main: Peter Lang.
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2, 18–22.
Long, J.S., & Freese, J. (2006) [2001]. Regression models for categorical dependent variables using Stata. College Station: Stata Press.
Louwerse, M., & Van Peer, W. (2009). How cognitive is cognitive poetics? The interaction between symbolic and embodied cognition. In G. Brône, & J. Vandaele (Eds.), Cognitive poetics goals, gains and gaps (pp. 423–444). Berlin & New York: Mouton de Gruyter.
Maechler, M. (2013). Cluster analysis extended. Available at: [URL].
Maindonald, J. (2008). Using R for data analysis and graphics: Introduction, code and commentary. Available at: [URL].
Maindonald, J., & Braun, J. (2010) [2003]. Data analysis and graphics using R (3rd ed.). Cambridge: Cambridge University Press.
Marden, J. (2011). Multivariate statistical analysis:Old school. Department of Statistics, University of Illinois at Urbana-Champaign. Available at: [URL].
Martin, A.D., Quinn, K.M., & Park, J.H. (2010). Markov chain Monte Carlo (MCMC) package. Available at: [URL].
Menard, S. (2002). Applied logistic regression analysis (2nd ed.). London & Thousand Oaks: Sage.
Menard, S. (2010). Logistic regression: From introductory to advanced concepts and applications. London & Los Angeles: Sage.
Morgenstern, A., Blondel, M., Caët, S., & Boutet, D. (2011). Hearing children’s use of pointing gestures: From pre-linguistic buds to the blossoming of communication skills. Presentation at SALC III, Copenhagen.
Murtagh, F. (2005). Correspondence analysis and data coding with R and Java. London: Chapman & Hall.
Myers, D. (1994). Testing for prototypicality: The Chinese morpheme gong. Cognitive Linguistics, 5, 261–280.
Neandić, O., & Greenacre, M. (2007). Correspondence analysis in R, with two- and three-dimensional graphics: The ca Package. Journal of Statistical Software, 20, 1–13.
Newman, J., & Rice, S. (2004). Patterns of usage for English sit, stand, and lie: A cognitively-inspired exploration in corpus linguistics. Cognitive Linguistics, 15, 351–396.
Newman, J., & Rice, S. (2006). Transitivity schemas of English eat and drink in the BNC. In St. Th. Gries, & A. Stefanowitsch (Eds.), Corpora in Cognitive Linguistics: Corpus-based approaches to syntax and lexis. (pp. 225–260). Berlin & New York: Mouton de Gruyter.
Nordmark, H., & Glynn, D. (2013). anxiety between mind and society: A corpus-driven cross-cultural study of conceptual metaphors. Explorations in English Language and Linguistics, 1, 107–130.
O’Connell, A. (2006). Logistic regression models for ordinal response variables. London & Thousand Oaks: Sage.
Oakes, M. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh University Press.
Orme, J., & Combs-Orme, T. (2009). Multiple regression with discrete dependent variables. Oxford: Oxford University Press.
Peirsman, Y.Heylen, K., & Geeraerts, D. (2010). Applying word space models to sociolinguistics: Religion names before and after 9/11. In D. Geeraerts, G. Kristiansen, & Y. Peirsman (Eds.), Advances in Cognitive Sociolinguistics (pp. 111–139). Berlin & New York: Mouton de Gruyter.
Pęzik, P. (2009). Extraction of multiword expressions for corpus-based discourse analysis. In B. Lewandowska-Tomaszczyk, & K. Dziwirek (Eds.), Studies in Cognitive Corpus Linguistics (pp. 249–272). Frankfurt/Main: Peter Lang.
Plevoets, K., Speelman, D., & Geeraerts, D. (2008). The distribution of T/V pronouns in Netherlandic and Belgian Dutch. In K. Schneider, & A. Baron (Eds.), Variational pragmatics: Regional varieties in pluricentric languages (pp. 181–209). Amsterdam & Philadelphia: John Benjamins.
Pütz, M, Robinson, J.A., & Reif, M. (Eds.) (2012). Cognitive Sociolinguistics: Social and cultural variation in cognition and language use. (Special edition of Annual Review of Cognitive Linguistics, 10.)
Ravid, D., & Hanauer, D. (1998). A prototype theory of rhyme: Evidence from Hebrew. Cognitive Linguistics, 9, 79–106.
Read, J., & Carroll, J. (2012). Annotating expressions of Appraisal in English. Language Resources and Evaluation, 46, 421–447.
Reif, M., Robinson, J.A., & Pütz, M. (Eds.). (2013). Variation in language and language use: Linguistic, socio-cultural and cognitive perspectives. Frankfurt/Main: Peter Lang.
Rencher, A. (2002). Methods of multivariate analysis (2nd ed.). New York: John Wiley.
Rice, S., Sandra, D., & Vanrespaille, M. (1999). Prepositional semantics and the fragile link between space and yime. In M. Hiraga, C. Sinha, & S. Wilcox (Eds.), Cultural, typology and psycholinguistic issues in Cognitive Linguistics (pp. 107–127). Amsterdam & Philadelphia: John Benjamins.
Ripley, B. (2013). Support functions and datasets for Venables and Ripley’s MASS. Available at: [URL].
Robinson, J.A. (2010a). Awesome insights into semantic variation. In D. Geeraerts, G. Kristiansen, & Y. Piersman (Eds.), Advances in Cognitive Sociolinguistics (pp. 85–109). Berlin & New York: Mouton de Gruyter.
Robinson, J.A. (2010b). Semantic variation and change in present-day English. Doctoral dissertation, University of Sheffield.
Robinson, J.A. (2012). A sociolinguistic perspective on semantic change. In K. Allan, & J.A. Robinson (Eds.), Current methods in Historical Semantics (pp. 191–231). Berlin & New York: Mouton de Gruyter.
Roever, C., Raabe, N., Luebke, K., Ligges, U., Szepannek, G., & Zentgraf, M. (2013). Classification and visualization. Unpublished manuscript available at: [URL].
Rudzka-Ostyn, B. (1989). Prototypes, schemas, and cross-category correspondences: The case of ask. In D. Geeraerts (Ed.), Prospects and problems of prototype theory (pp. 613–661). Berlin & New York: Mouton de Gruyter.
Rudzka-Ostyn, B. (1995). Metaphor, schema, invariance: The case of verbs of answering. In L. Goossens, P. Pauwels, B. Rudzka-Ostyn, A.-M. Simon-Vandenbergen, & J. Vanparys (Eds.), By word of mouth: Metaphor, metonymy, and linguistic action from a cognitive perspective (pp. 205–244). Amsterdam & Philadelphia: John Benjamins.
Ruette, T., Ehret, K., & Szmrecsanyi, B. (In press). Frequency effects in lexical sociolectometry are insubstantial. In H. Behrens, & S. Pfänder (Eds.), Again on frequency effects in language. Berlin & New York: Mouton de Gruyter.
Ruette, T., Geeraerts, D., Peirsman, Y., & Speelman, D. (Forthcoming). Semantic weighting mechanisms in scalable lexical sociolectometry. In B. Szmrecsanyi, & B. Waelchli (Eds.), Aggregating dialectology and typology: Linguistic variation in text and speech, within and across languages. Berlin & New York: Mouton de Gruyter.
Sagi, E., Kaufmann, S., & Clark, B. (2011). Tracing semantic change with latent semantic analysis. In K. Allan, & J. Robinson (Eds.), Current methods in Historical Semantics (pp. 161–183). Berlin & New York: Mouton de Gruyter.
Sandra, D., & Rice, S. (1995). Network analyses of prepositional meaning: Mirroring whose mind – the linguist’s or the language user’s? Cognitive Linguistics, 6, 89–130.
Scherer, K. (2005). What are emotions? And how can they be measured?Social Science Information, 44, 693–727.
Schmid, H.-J. (1993). Cottage and co., idea, start vs. begin: Die kategorisierung als grundprinzip einer differenziertenbedeutungsbeschreibung. Tübingen: Max Niemeyer.
Schmid, H.-J. (2000). English abstract nouns as conceptual shells: From corpus to cognition. Berlin & New York: Mouton de Gruyter.
Schmidtke-Bode, K. (2009). Going-to-V and gonna-V in child language: A quantitative approach to constructional development. Cognitive Linguistics, 20, 509–553.
Schönbrodt, F., Collins, L., & Stemmler, M. (2013). cfa2: Configuration frequency analysis with a design matrix. Available at: [URL].
Schulze, R. (1991). Getting round to (a)round: Towards the description and analysis of a ‘spatial’ predicate. In G. Rauh (Ed.), Approaches to prepositions (pp. 253–74).Tubingen: Günter Narr.
Sheather, S. (2009). A modern approach to regression with R. New York: Springer.
Smith, R. (2011). Multilevel modeling of social problems: A causal perspective. Heidelberg: Springer.
Speelman, D., & Geeraerts, D. (2010). Causes for causatives: The case of Dutch ‘doen’ and ‘laten’. In T. Sanders, & E. Sweetser (Eds.), Causal categories in discourse and cognition (pp. 173–204). Berlin & New York: Mouton de Gruyter.
Stefanowitsch, A. (2010). Empirical Cognitive Semantics: Some thoughts. In D. Glynn, & K. Fischer (Eds.), Quantitative Cognitive Semantics: Corpus-driven approaches (pp. 355–380). Berlin & New York: Mouton de Gruyter.
Stefanowitsch, A., & St. Th. Gries. (2005). Covarying collexemes. Corpus Linguistics and Linguistic Theory, 1, 1–43.
Stefanowitsch, A., & St. Th. Gries. (2008). Register and constructional meaning: A collostructional case study. In G. Kristiansen, & R. Dirven (Eds.), Cognitive Sociolinguistics: Language variation, cultural models, social systems (pp. 129–152). Berlin & New York: Mouton de Gruyter.
Stefanowitsch, A., & Gries, St. Th. (Eds.). (2006). Corpus-based approaches to metaphor and metonymy. Berlin & New York: Mouton de Gruyter.
Stevens, J. 2001. Applied multivariate statistics for the social sciences (4th ed.). Mahwah: Lawrence Erlbaum.
Strobl, C., Hothorn, T., & Zeileis, A. (2009a). Party on! A new, conditional variable importance measure for random forests available in the party package. The R Journal, 1, 14–17.
Strobl, C., Malley, J., & Gerhard T. (2009b). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14, 323–348.
Suzuki, R. (2013). Hierarchical clustering with p-values via multiscale bootstrap resampling. Available at: [URL].
Suzuki, R., & Hidetoshi, S. (2006). Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics, 22, 1540–1542.
Szmrecsanyi, B. (2003). Be going to versus will/shall: Does syntax matter?Journal of English Linguistics, 31, 295–323.
Szmrecsanyi, B. (2006). Morphosyntactic persistence in spoken English: A corpus study at the intersection of Variationist Sociolinguistics, Psycholinguistics, and Discourse Analysis. Berlin & New York: Mouton de Gruyter.
Szmrecsanyi, B. (2010). The English genitive alternation in a cognitive sociolinguistic perspective. In D. Geeraerts, G. Kristiansen, & Y. Peirsman (Eds.), Advances in Cognitive Sociolinguistics (pp. 141–166). Berlin & New York: Mouton de Gruyter.
Szmrecsanyi, B. (2013). Grammatical variation in British English dialects. Cambridge: Cambridge University Press.
Tabachnick, B., & Fidell, L. (2007). Using multivariate statistics (5th ed.). London: Pearson.
Taboada, M., & Carretero, M. (2012). Contrastive analyses of evaluation in text: Key issues in the design of an annotation system for attitude applicable to consumer reviews in English and Spanish. Linguistics and the Human Sciences, 6, 275–295.
Tarling, R. (2009). Statistical modelling for social researchers: Principles and practice. London & New York: Routledge.
Therneau, T., Atkinson, EFoundation, M.., & (2013). An introduction to recursive partitioning using the RPART routines. Available at: [URL].
Thompson, L. (2009). S-PLUS (and R) manual to accompany Agresti’s categorical data analysis (2002). Available at: [URL].
Tummers, J., Heylen, K., & Geeraerts, D. (2005). Usage-based approaches in Cognitive Linguistics: A technical state of the art. Corpus Linguistics and Linguistic Theory, 1, 225–261.
Valenzuela Manzanares, J., & Rojo López, A.M. (2008). What can language learners tell us about constructions? In S. De Knop, & T. De Rycker (Eds.), Cognitive approaches to pedagogical grammar? A volume in honour of René Dirven (pp. 197–230). Berlin & New York: Mouton de Gruyter.
Van Bogaert, J. (2010). A constructional taxonomy of I think and related expressions: Accounting for the variability of complement-taking mental predicates. English Language and Linguistics, 14, 399–428.
Venables, W., & Ripley, B. (2002). Modern applied statistics with S (4th ed.). Heidelberg: Springer.
Verdonik, D., Rojc, M., & Stabej, M. (2007). Annotating discourse markers in spontaneous speech corpora on an example for the Slovenian language. Language Resources and Evaluation, 41, 147–180.
von Eye, A. (2002). Configural frequency analysis: Methods, models, and applications. Mahwah: Erlbaum.
von Eye, A., & Mair, P. (2008) A functional approach to configural frequency analysis. Austrian Journal of Statistics, 37, 161–173.
von Eye, A, Mair, P., & Mun, E.-Y. (2010). Advances in configural frequency analysis. London: Guilford Press.
von Eye, A, & Mun, E.-Y. (2013). Log-linear modeling: Concepts, interpretation, and application. Hoboken: John Wiley.
Wiebe, J., Wilson, T., & Cardie, C. (2005). Annotating expressions of opinions and emotions in language. Language Resources and Evaluation, 39, 165–210.
Wiechmann, D. (2008). On the computation of collostruction strength: Testing measures of association as expressions of lexical bias. Corpus Linguistics and Linguistic Theory, 4, 253–290.
Wulff, S. (2006). Go-V vs. go-and-V in English: A case of constructional synonymy? In St. Th. Gries, & A. Stefanowitsch (Eds.), Corpora in Cognitive Linguistics: Corpus-based approaches to syntax and lexis (pp. 101–126). Berlin & New York: Mouton de Gruyter.
Wulff, S. (2009). Rethinking idiomaticity: A usage-based approach. London: Continuum.
Wulff, S. (2010). Marrying cognitive-linguistic theory and corpus-based methods: On the compositionality of English V NP-idioms. In D. Glynn, & K. Fischer (Eds.), Quantitative Cognitive Semantics: Corpus-driven approaches (pp. 223–238). Berlin & New York: Mouton de Gruyter.
Zeschel, A. (2010). Exemplars and analogy: Semantic extension in constructional networks. In D. Glynn, & K. Fischer (Eds.), Quantitative Cognitive Semantics: Corpus-driven approaches (pp. 201–221). Berlin & New York: Mouton de Gruyter.
Zhao, Y. (2013). R and data mining: Examples and case studies. Unpublished manuscript. Available at: [URL].
Zlatev, J., & Andrén, M. (2009). Stages and transitions in children’s semiotic development. In J. Zlatev, M. Andrén, C. Lundmark, & M. Johansson Falck (Eds.), Studies in language and cognition (pp. 380–401). Newcastle: Cambridge Scholars.
Cited by (15)
Cited by 15 other publications
Bębeniec, Daria
2024. In search of methodological standards for corpus-based cognitive semantics: The case of Behavioral Profiles. Studia Neophilologica► pp. 1 ff.
González Granado, Nicolás, Patrick Drouin & Aurélie Picton
2023. De l’analyse statistique à l’apprentissage automatique : le langage R au service de la terminologie. Éla. Études de linguistique appliquée N° 208:4 ► pp. 447 ff.
KAMBARA, Kazuho & Tsukasa YAMANAKA
2023. <i>Philosophy of Data Science for Corpus Linguistics:</i>. Annals of the Japan Association for Philosophy of Science 32:0 ► pp. 47 ff.
SCHNEIDER, EDGAR W.
2023. Lexicosemantic diffusion in World Englishes: variable meaning–form relations in prospective verbs. English Language and Linguistics 27:4 ► pp. 719 ff.
2021. Diachronic Cognitive Linguistics. Yearbook of the German Cognitive Linguistics Association 9:1 ► pp. 1 ff.
Podhorodecka, Joanna
2021. Real-life pseudo-passives: The usage and discourse functions of adjunct-based passive constructions. Poznan Studies in Contemporary Linguistics 57:1 ► pp. 33 ff.
Zehentner, Eva
2021. Alternations emerge and disappear: the network of dispossession constructions in the history of English. Corpus Linguistics and Linguistic Theory 17:3 ► pp. 525 ff.
Calvo, Elisa & Marián Morón
2020. Investigación con corpus cualitativos en los estudios de traducción: el problema de los constructos traductológicos complejos. Meta 65:1 ► pp. 237 ff.
2017. Towards the Spatial Analysis of Vague and Imaginary Place and Space: Evolving the Spatial Humanities through Medieval Romance. Journal of Map & Geography Libraries 13:1 ► pp. 29 ff.
Riou, Marine
2015. A Methodology for the Identification of Topic Transitions in Interaction. Discours :16
Riou, Marine
2017. The Prosody of Topic Transition in Interaction: Pitch Register Variations. Language and Speech 60:4 ► pp. 658 ff.
This list is based on CrossRef data as of 16 october 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.