This paper explores ways in which research into collocation should be improved. After a discussion of the parameters underlying the notion of collocation, the paper has three main parts. First, I argue that corpus linguistics would benefit from taking more seriously the understudied fact that collocations are not necessarily symmetric, as most association measures imply. Also, I introduce an association measure from the associative learning literature that can identify asymmetric collocations and show that it can also distinguish collocations with high and low association strengths well. Second, I summarize some advantages of this measure and brainstorm about ways in which it can help re-examine previous studies as well as support further applications. Finally, I adopt a broader perspective and discuss a variety of ways in which all association measures – directional or not – in corpus linguistics should be improved in order for us to obtain better and more reliable results.
Bartsch, S. 2004. Structural and Functional Properties of Collocations in English: A Corpus Study of Lexical and Pragmatic Constraints on Lexical Co-occurrence. Tübingen: Gunter Narr.
Bell, A., Brenier, J.M., Gregory, M., Girand, C. & Jurafsky, D. 2009. “Predictability effects on durations of content and function words in conversational English”. Journal of Memory and Language, 60 (1), 92–111.
Evert, S. 2005. The Statistics of Word Co-occurrences: Word Pairs and Collocations. Ph.D. thesis. Stuttgart: University of Stuttgart.
Evert, S. 2009. “Corpora and collocations”. InA. Lüdeling & M. Kytö(Eds.), Corpus Linguistics: An International Handbook, Vol. 2. Berlin/New York: Mouton de Gruyter, 1212–1248.
Ferraresi, A. & Gries, St. Th. 2011. “Type and (?) token frequencies in measures of collocational strength: Lexical gravity vs. a few classics”. Paper presented at
Corpus Linguistics 2011
,
University of Birmingham, UK
.
Firth, J.R. 1957. “A synopsis of linguistic theory 1930–1955”. InF. Palmer(Ed.), Selected Papers of J. R. Firth 1952–1959. London: Longman, 168–205.
Gries, St. Th. 2001. “A corpus-linguistic analysis of -ic and -ical adjectives”. ICAME Journal, 25, 65–108.
Gries, St. Th. 2010a. “Dispersions and adjusted frequencies in corpora: Further explorations”. InS. Th. Gries, S. Wulff & M. Davies(Eds.), Corpus Linguistic Applications: Current Studies, New Directions. Amsterdam: Rodopi, 197–212.
Gries, St. Th. 2010b: online. “Bigrams in registers, domains, and varieties: A bigram gravity approach to the homogeneity of corpora”. InM. Mahlberg, V. González-Diaz & C. Smith(Eds.), Proceedings of the Corpus Linguistics Conference (CL 2009),
University of Liverpool, UK
, 20–23 July 2009.Available at: [URL] (accessedJuly 2012).
Gries, St. Th. 2012. “Corpus linguistics, theoretical linguistics, and cognitive/psycholinguistics: Towards more and more fruitful exchanges”. InJ. Mukherjee & M. Huber(Eds.), Corpus Linguistics and Variation in English: Theory and Description. Amsterdam: Rodopi, 41–63.
Gries, St. Th., Hampe, B. & Schönefeld, D. 2005. “Converging evidence: Bringing together experimental and corpus data on the association of verbs and constructions”. Cognitive Linguistics, 16 (4), 635–676.
Jelinek, F. 1990. “Self-organized language modeling for speech recognition”. InA. Waibel & K.-F. Lee(Eds.), Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann, 450–506.
Kilgarriff, A. 2009. “Simple maths for keywords”. Paper presented at
Corpus Linguistics 2009
,
University of Liverpool
.
Kjellmer, G. 1991. “A mint of phrases”. InK. Aijmer & B. Altenberg(Eds.), English Corpus Linguistics: Studies in Honor of Jan Svartvik. London: Longman, 111–127.
McGee, I. 2009. “Adjective-noun collocations in elicited and corpus data: Similarities, differences, and the whys and wherefores”. Corpus Linguistics and Linguistic Theory, 5 (1), 79–103.
Michelbacher, L., Evert, S. & Schütze, H. 2007. “Asymmetric association measures”. Paper presented at the
6th International Conference on Recent Advances in Natural Language Processing
,
Borovets, Bulgaria
.
Michelbacher, L., Evert, S. & Schütze, H. 2011. “Asymmetry in corpus-derived and human word associations”. Corpus Linguistics and Linguistic Theory, 7 (2), 245–276.
Mollin, S. 2009. “Combining corpus linguistic and psychological data on word co-occurrences: Corpus collocates versus word associations”. Corpus Linguistics and Linguistic Theory, 5 (2), 175–200.
Nordquist, D. 2009. “Investigating elicited data from a usage-based perspective”. Corpus Linguistics and Linguistic Theory, 5 (1), 105–130.
Pecina, P. 2009. “Lexical association measures and collocation extraction”. Language Resources and Evaluation, 44 (1–2), 137–158.
Pedersen, T. 1998. “Dependent bigram identification”. In
Proceedings of the Fifteenth National Conference on Artificial Intelligence (AAAI-98)
, July 28–30, 1197.
R Development Core Team. 2012: online. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. Available at: [URL] (accessedJuly 2012).
Raymond, W.D. & Brown, E.L. 2012. “Are effects of word frequency effects of context of use? An analysis of initial fricative reduction in Spanish”. InSt. Th. Gries & D.S. Divjak(Eds.), Frequency Effects in Language Learning and Processing. Berlin/New York: Mouton de Gruyter, 35–52.
Shanks, D.R. 1995. The Psychology of Associative Learning. New York: Cambridge University Press.
Smadja, F. 1993. “Retrieving collocations from text: Xtract”. Computational Linguistics, 19 (1), 143–177.
Stubbs, M. 2001. Words and Phrases: Corpus Studies of Lexical Semantics. Oxford/Malden, MA: Blackwell.
Tversky, A. 1977. “Features of similarity”. Psychological Review, 84 (4), 327–352.
Wahl, A.R. 2011. “Intonation unit boundaries and the entrenchment of collocations: Evidence from bidirectional and directional association measures”. Unpublished ms, Department of Linguistics, University of California, Santa Barbara.
Wiechmann, D. 2008. “On the computation of collostruction strength: Testing measures of association as expressions of lexical bias”. Corpus Linguistics and Linguistic Theory, 4 (2), 253–290.
Zhang, W., Yoshida, T., Tang, X. & Ho, T.-B. 2009. “Improving effectiveness of mutual information for substantival multiword expression extraction”. Expert Systems with Applications, 36 (8), 10919–10930.
Cited by (5)
Cited by five other publications
Rastelli, Stefano
2022. Intra-language: the study of L2 morpheme productivity as within-item variance. International Review of Applied Linguistics in Language Teaching 60:4 ► pp. 1143 ff.
Rastelli, Stefano
2022. Intra-language: the study of L2 morpheme productivity as within-item variance. International Review of Applied Linguistics in Language Teaching 60:4 ► pp. 1143 ff.
Rastelli, Stefano & Akira Murakami
2022. Apparently identical verbs can be represented differently: comparing L1–L2 inflection with contingency-based measure ΔP. Corpora 17:1 ► pp. 97 ff.
Smith, Chris A.
2018. Diachronic patterns of usage of no doubt in the English Historical Book Collection (EEBO, ECCO and EVANS). ExELL 6:1 ► pp. 1 ff.
Bendinelli, Marion
2017. Segments phraséologiques et séquences textuelles. Corpus :17
This list is based on CrossRef data as of 3 december 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.