This paper explores the effectiveness of Juilland’s D as a measure of vocabulary dispersion in large corpora. Through a series of experiments using the BNC, we explored the influence of three variables: the number of corpus-parts used for the computation of D, the frequency of the target word, and the distributions of those words. The experiments demonstrate that the effective range for D is greatly reduced when computations are based on a large number of corpus-parts: even words with highly skewed distributions have D values indicating a relatively uniform distribution. We also briefly explore an alternative measure, Gries’ DP (Gries 2008), showing that it is a more reliable and effective measure of dispersion in a large corpus divided into many parts. In conclusion, we discuss the implications of these findings for quantitative methods applied to the creation of vocabulary lists as well as research questions in other areas of corpus linguistics.
Baker, P., & Egbert, J. (Eds.) (2016). Triangulating Methodological Approaches in Corpus-linguistic Research. New York, NY: Routledge.
Biber, D. (2012). Register as a predictor of linguistic variation. Corpus Linguistics and Linguistic Theory, 8(1), 9–37.
Biber, D., Egbert, J., Gray, B., Oppliger, R., & Szmrecsanyi, B. (Forthcoming). Variationist versus text-linguistic approaches to grammatical change in English: Nominal modifiers of head nouns. In M. Kytö & P. Pahta (Eds.), Cambridge Handbook of English Historical Linguistics. Cambridge: Cambridge University Press.
Brezina, V., & Gablasova, D. (2015). Is there a core general vocabulary? Introducing the New General Service List. Applied Linguistics, 36(1), 1–22.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.
Davies, M., & Gardner, D. (2010). A Frequency Dictionary of Contemporary American English: Word Sketches, Collocates, and Thematic Lists. London: Routledge.
Evert, S. (2004). The statistics of word co-occurrences: Word pairs and collocations (Unpublished doctoral dissertation). University of Stuttgart, Germany. Retrieved from [URL] (last accessed September 2016).
Gardner, D., & Davies, M. (2014). A new academic vocabulary list. Applied Linguistics, 34(5), 1–24.
Juilland, A.G., Brodin, D.R., & Davidovitch, C. (1970). Frequency Dictionary of French Words. The Hague: Mouton de Gruyter
Juilland, A., & Chang-Rodriguez, E. (1964). Frequency Dictionary of Spanish words. The Hague: Mouton de Gruyter.
Lyne, A. (1985). The Vocabulary of French Business Correspondence. Geneva: Slatkine.
Leech, G., Rayson, P., & Wilson, A. (2001). Word Frequencies in Written and Spoken English: Based on the British National Corpus. London: Longman.
Martin, J.D., & Gray, L.N. (1971). Measurement of relative variation: Sociological examples. American Sociological Review, 36(3), 496–502.
Oakes, M. (1998). Statistics for Corpus Linguistics. Edinburgh: Edinburgh Press.
Cited by (32)
Cited by 32 other publications
Grindrod, Jumbly
2024. Justification: Insights from Corpora. Episteme 21:3 ► pp. 794 ff.
Wang, Ying
2024. “Guided by the science”: a keyword analysis of government ministers’ and scientists’ stance in the UK government’s COVID-19 press briefings. Text & Talk
2023. Too Noisy at the Bottom: Why Gries’ (2008, 2020) Dispersion Measures Cannot Identify Unbiased Distributions of Words. Journal of Quantitative Linguistics 30:2 ► pp. 153 ff.
Nelson, Robert N.
2024. Groundhog Day is Not a Good Model for Corpus Dispersion. Journal of Quantitative Linguistics► pp. 1 ff.
Posch, Claudia
2023. Half-Witted or Hard-Working-Fun-Loving Women? – A Corpus-Assisted Study of Gendered Collocation in the New Zealand Alpine Club Journal Corpus. Zeitschrift für Anglistik und Amerikanistik 71:3 ► pp. 241 ff.
Gries, Stefan Th.
2022. Toward more careful corpus statistics: uncertainty estimates for frequencies, dispersions, association measures, and more. Research Methods in Applied Linguistics 1:1 ► pp. 100002 ff.
Th Gries, Stefan
2024.
Corrections to Nelson (2023):
DP
norm
and
D
KLnorm
are Not Wrong on Pi at All
. Journal of Quantitative Linguistics 31:1 ► pp. 43 ff.
Th. Gries, Stefan
2020. Analyzing Dispersion. In A Practical Handbook of Corpus Linguistics, ► pp. 99 ff.
2022. Technical vocabulary in languages for special purposes: The corpus-based Russian economics word list. Lingua 273 ► pp. 103326 ff.
McGrath, Darby & Cassi Liardét
2022. A corpus-assisted analysis of grammatical metaphors in successful student writing. Journal of English for Academic Purposes 56 ► pp. 101090 ff.
Qian, Yubin
2022. A stylometric approach to the interdiscursivity of professional practice. Humanities and Social Sciences Communications 9:1
Serigos, Jacqueline
2022. Using automated methods to explore the social stratification of anglicisms in Spanish. Corpus Linguistics and Linguistic Theory 18:2 ► pp. 391 ff.
Omidian, Taha & Anna Siyanova-Chanturia
2021. Parameters of variation in the use of words in empirical research writing. English for Specific Purposes 62 ► pp. 15 ff.
2020. Bootstrapping Techniques. In A Practical Handbook of Corpus Linguistics, ► pp. 593 ff.
Miller, Don
2020. Analysing Frequency Lists. In A Practical Handbook of Corpus Linguistics, ► pp. 77 ff.
Miller, Don
2022. Replication as a means of assessing corpus representativeness and the generalizability of specialized word lists. Applied Corpus Linguistics 2:3 ► pp. 100027 ff.
Bednarek, Monika
2018. Language and Television Series,
Brezina, Vaclav
2018. Statistics in Corpus Linguistics,
Csomay, Eniko & Alexandra Prades
2018. Academic vocabulary in ESL student papers: A corpus-based study. Journal of English for Academic Purposes 33 ► pp. 100 ff.
Dang, Thi Ngoc Yen
2018. The nature of vocabulary in academic speech of hard and soft-sciences. English for Specific Purposes 51 ► pp. 69 ff.
This list is based on CrossRef data as of 6 january 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.