Handle it in-house?
Learner corpora frequency lists and lexical sophistication
Vocabulary lists of high-frequency lexical items are an important resource in language education and a key product of corpus research. However, no single vocabulary list will be useful for every learning context, with the appropriateness of such lists affected by the corpora on which they are based. This paper investigates the impact of corpus selection on one measure of lexical sophistication, Advanced Guiraud, focusing on two frequency lists originating from an in-house learner corpus (PELIC) and a global learner corpus (Cambridge Learner Corpus). This analysis shows that frequency lists derived from both types of learner corpus can effectively serve as the basis for measuring the development of lexical sophistication, regardless of the specific program of the learners. Therefore, publicly available learner corpus frequency lists can be a reliable resource for stakeholders interested in the lexical gains of language learners.
Article outline
- 1.Introduction
- 2.Learner corpora (LC) and lexical sophistication
- 2.1In-house corpora: The University of Pittsburgh English Language Institute Corpus (PELIC)
- 2.2Global corpora: ETS Corpus of Non-Native Written English (ETS)
- 2.3Lexical sophistication
- 2.4Motivation for the current study
- 3.Methodology
- 3.1Frequency lists
- 3.2ETS comparison
- 3.3Data collection and description
- 3.4Comparison with lexical diversity
- 4.Results
- 4.1Lexical sophistication descriptive statistics
- 4.2Lexical sophistication inferential statistics
- 4.3AG comparison to vocD
- 4.4Results summary
- 5.Discussion
- 6.Conclusion
- Notes
-
References
References (54)
References
Baayen, R. (2008). Analyzing Linguistic Data: A Practical Introduction to Statistics using R. Cambridge University Press.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software,
67
(1), 1–48.
Bestgen, Y. (2017). Beyond single-word measures: L2 writing assessment, lexical richness and formulaic competence. System,
69
1, 65–78.
Blanchard, D., Tetreault, J., Higgins, D., Cahill, A., & Chodorow, M. (2014). ETS Corpus of Non-Native Written English LDC2014T06. Linguistic Data Consortium.
Browne, C., Culligan, B., & Phillips, J. (2013). The New General Service List. [URL]
Callies, M. (2015). Learner corpus methodology. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 35–56). Cambridge University Press.
Cambridge English Language Assessment (2012). Cambridge English: Preliminary and Preliminary for Schools Vocabulary List. [URL]
Centre for English Corpus Linguistics. (2019). Learner Corpora around the World. Université catholique de Louvain. Retrieved January, 2019, from [URL]
Cobb, T. (2018). Compleat Web VP [Computer software]. [URL]
Cobb, T. & Horst, M. (2015). Learner corpora and lexis. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 185–206). Cambridge University Press.
Council of Europe (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Press Syndicate of the University of Cambridge.
Coxhead, A. (2000). A new Academic Word List. TESOL Quarterly,
34
(2), 213–238.
Crossley, S. A., Salsbury, T., & Mcnamara, D. S. (2015). Assessing lexical proficiency using analytic ratings: A case for collocation accuracy. Applied Linguistics,
36
(5), 570–590.
Daller, H., & Phelan, D. (2007). What is in a teacher’s mind? Teacher ratings of EFL essays and different aspects of lexical richness. In H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and Assessing Vocabulary Knowledge (pp. 234–244). Cambridge University Press.
Daller, H., van Hout, R., & Treffers-Daller, J. (2003). Lexical richness in the spontaneous speech of bilinguals. Applied Linguistics,
24
(2), 197–222.
Daller, H., & Xue, H. (2007). Lexical richness and the oral proficiency of Chinese EFL students. In H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and Assessing Vocabulary Knowledge (pp. 150–164). Cambridge University Press.
Davies, M. (2008–). The Corpus of Contemporary American English (COCA): 560 million words, 1990-present. Retrieved October, 2018, from [URL] (accessed
Duràn, P., Malvern, D., Richards, B., & Chipere, N. (2004). Developmental trends in lexical diversity. Applied Linguistics,
25
(2), 220–242.
Dunlap, S. (2012). Orthographic Quality in English as a Second Language. [Unpublished doctoral dissertation]. University of Pittsburgh.
Gablasova, D., Brezina, V., & McEnery, T. (2017). Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning,
67
(1), 130–154.
Geertzen, J., Alexopoulou, T., & Korhonen, A. (2013). Automatic linguistic annotation of large scale L2 databases: The EF-Cambridge Open Language Database (EFCAMDAT). Selected Proceedings of the 31st Second Language Research Forum (SLRF) (pp. 240–254). Cascadilla Press.
Gilquin, G. (2015). From design to collection of learner corpora. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research (pp. 9–34). Cambridge University Press.
Granger, S., & Wynne, M. (1999). Optimising measures of lexical variation in EFL learner corpora. In J. Kirk (Ed.), Corpora Galore (pp. 249257). Rodopi.
Holliday, A. (2006). Native-speakerism. ELT Journal,
60
(4), 385–387.
Juffs, A., Han, N-R., & Naismith, B. (2020). PELIC: The University of Pittsburgh English Language Institute Corpus. Available online at [URL]
Kim, M. M., Crossley, S. A., & Kyle, K. (2018). Lexical sophistication as a multidimensional phenomenon: Relations to second language lexical proficiency, development, and writing quality. The Modern Language Journal,
102
(1), 120–141.
Laufer, B., & Nation, P. (1995). Vocabulary size and use: Lexical richness in L2 written production. Applied Linguistics,
16
(3), 307–322.
Lindqvist, C., Gudmundson, A., & Bardel, C. (2013). A new approach to measuring lexical sophistication in L2 oral production. In C. Bardel, C. Lindqvist, & B. Laufer (Eds.), EUROSLA Monographs Series 2 (pp. 109–126). European Second Language Association.
Malvern, D., Richards, B. J., Chipere, N., & Durán, P. (2004). Lexical Diversity and Language Development. Palgrave Macmillan.
McCarthy, M. (1998). Spoken Language and Applied Linguistics. Cambridge University Press.
Monteiro, K. R., Crossley, S. A., & Kyle, K. (2018). In search of new benchmarks: Using L2 lexical frequency and contextual diversity indices to assess second language writing. Applied Linguistics,
41
(2), 1–22.
Mukherjee, J., & Rohrbach, J.-M. (2006). Rethinking applied corpus linguistics from a language-pedagogical perspective: New departures in learner corpus research. In B. Kettemann, & G. Marko (Eds.), Planing, Gluing and Painting Corpora: Inside the Applied Corpus Linguist’s Workshop (pp. 205–232). Peter Lang.
Naismith, B., Han, N.-R., Juffs, A., Hill, B. L., & Zheng, D. (2018). Accurate measurement of lexical sophistication with reference to ESL learner data. In K. E. Boyer & M. Yudelson (Eds.), Proceedings of the 11th International Conference on Educational Data Mining (pp 259–265). International Educational Data Mining Society. [URL]
Nation, I. S. P. (2001). Learning Vocabulary in Another Language. Cambridge University Press.
Ortega, L. (2016). Multi-competence in second language acquisition: Inroads into the mainstream? In V. Cook & L. Wei (Eds.), The Cambridge Handbook of Linguistic Multi-Competence (pp. 50–76). Cambridge University Press.
Princeton University. (2010). WordNet Search – 3.1. WordNet. [URL]
R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL [URL]
Rampton, M. B. H. (1990). Displacing the ‘native speaker’: Expertise, affiliation, and inheritance. ELT Journal,
44
(2), 97–101.
Schmitt, N., & Schmitt, D. (2014). A reassessment of frequency and vocabulary size in L2 vocabulary teaching. Language Teaching,
47
(4), 484–503.
Speelman, D., Heylen, K., & Geeraerts, D. (Eds.). (2018). Mixed-Effects Regression Models in Linguistics. Springer.
Stewart, D., Bernardini, S., & Aston, G. (2004). Introduction: Ten years of TaLC. In D. Stewart, S. Bernardini, & G. Aston (Eds.), Corpora and Language Learners (pp. 1–18). John Benjamins.
Tidball, F., & Treffers-Daller, J. (2008). Analysing lexical richness in French learner language: What frequency lists and teacher judgements can tell us about basic and advanced words. Journal of French Language Studies,
18
(3), 299–313.
van Hout, R., & Vermeer, A. (2007). Comparing measures of lexical richness. In H. Daller, J. Milton, & J. Treffers-Daller (Eds.), Modelling and Assessing Vocabulary Knowledge (pp. 93–115). Cambridge University Press.
Van Rossum, G., & Drake, F. L. (2009). Python 3 Reference Manual. CreateSpace.
Vilkaitė-Lozdienė, L., & Schmitt, N. (2020). Frequency as a guide for vocabulary usefulness. In S. Webb (Ed.), The Routledge Handbook of Vocabulary Studies (pp. 81–96). Routledge.
Cited by (1)
Cited by one other publication
Berríos, Juan, Angela Swain & Melinda Fricke
2023.
Implementing the map task in applied linguistics research: What, how, and why.
Research Methods in Applied Linguistics 2:3
► pp. 100081 ff.
This list is based on CrossRef data as of 4 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.