Assessing receptive vocabulary using state‑of‑the‑art natural language processing techniques

Crossley, Scott; Holmes, Langdon

doi:10.1075/jsls.22006.cro

Article published In:

Journal of Second Language Studies
Vol. 6:1 (2023) ► pp.1–28

Assessing receptive vocabulary using state‑of‑the‑art natural language processing techniques

Scott Crossley | Vanderbilt University, United States

Langdon Holmes | Vanderbilt University, United States

Semantic embedding approaches commonly used in natural language processing such as transformer models have rarely been used to examine L2 lexical knowledge. Importantly, their performance has not been contrasted with more traditional annotation approaches to lexical knowledge. This study used NLP techniques related to lexical annotations and semantic embedding approaches to model the receptive vocabulary of L2 learners based on their lexical production during a writing task. The goal of the study is to examine the strengths and weaknesses of both approaches in understanding L2 lexical knowledge. Findings indicate that transformer approaches based on semantic embeddings outperform linguistic annotations and Word2vec models in predicting L2 learners’ vocabulary scores. The findings help to support the strength and accuracy of semantic-embedding approaches as well as their generalizability across tasks when compared to linguistic feature models. Limitations to semantic-embedding approaches, especially interpretability, are discussed.

Keywords: natural language processing, corpus linguistics, lexical knowledge, Doc2Vec, BERT, word-embeddings, lexical annotations

Article outline

1.Introduction
2.Literature review
- 2.1Lexical knowledge
- 2.2Measuring L2 lexical knowledge
3.Current study
4.Method
- 4.1Corpus
- 4.2Receptive vocabulary knowledge
- 4.3Lexical annotations
  - Age of acquisition
  - Concreteness
  - Word familiarity
  - Word meaningfulness
  - Lexical response times
  - Word associations
  - Phonological distance
  - Word frequency
  - Collocation strength
  - Contextual distinctiveness
- 4.4Semantic embedding
  - Doc2vec
  - Transformers
- 4.5Statistical analysis
5.Results
- 5.1Lexical annotations model
- 5.2Doc2vec model
- 5.3BERT model
- 5.4Comparisons between models
6.Discussion
7.Conclusion
Notes
References

Published online: 20 September 2022

https://doi.org/10.1075/jsls.22006.cro

References (89)

References

Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 581, 82–115.

Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M. (2004). Visual word recognition of single-syllable words. Journal of Experimental Psychology. General, 133 (2), 283–316.

Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 391, 445–459.

Berger, C., Crossley, S., & Kyle, K. (2019). Using native-speaker psycholinguistic norms to predict lexical proficiency and development in second-language production. Applied Linguistics, 40 (1), 22–42.

Berger, C., Crossley, S., & Skalicky, S. (2019). Using lexical features to investigate second language lexical decision performance. Studies in Second Language Acquisition, 41 (5), 911–935.

Biber, D. (1988). Variation across Speech and Writing. Cambridge University Press.

Biber, D., Gray, B., & Staples, S. (2016). Predicting Patterns of Grammatical Complexity Across Language Exam Task Types and Proficiency Levels. Applied Linguistics, 37 (5), 639–668.

Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc.

BNC Consortium, The British National Corpus, XML Edition, (2007), Oxford Text Archive, [URL]

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. ArXiv:1607.04606 [Cs]. [URL].

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language Models are Few-Shot Learners. ArXiv:2005.14165 [Cs]. [URL]

Brysbaert, M., & New, B. (2009). Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41 (4), 977–990.

Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46 (3), 904–911.

Clark, K., Khandelwal, U., Levy, O., & Manning, C. D. (2019). What Does BERT Look At? An Analysis of BERT’s Attention (arXiv:1906.04341). arXiv. [URL]

Cobb, T. (n.d.). Web Vocabprofile. [URL]

Conrad, S. (2005). Corpus Linguistics and L2 Teaching. In Handbook of Research in Second Language Teaching and Learning. Routledge.

Crossley, S. A., & Kyle, K. (2022). Managing Second Language Acquisition Data with Natural Language Processing Tools. In The Open Handbook of Linguistic Data Management (pp. 411–421). The MIT Press.

Crossley, S. A., Salsbury, T., McNamara, D. S., & Jarvis, S. (2011a). What Is Lexical Proficiency? Some Answers from Computational Models of Speech Data. TESOL Quarterly: A Journal for Teachers of English to Speakers of Other Languages and of Standard English as a Second Dialect, 45 (1), 182–193.

(2011b). Predicting lexical proficiency in language learner texts using computational indices. Language Testing, 28 (4), 561–580.

Crossley, S. A., & Skalicky, S. (2019). Examining Lexical Development in Second Language Learners: An Approximate Replication of Salsbury, Crossley & McNamara (2011). Language Teaching, 52 (3), 385–405.

Crossley, S. A., Skalicky, S., Kyle, K., & Monteiro, K. (2019). Absolute frequency effects in second language lexical acquisition. Studies in Second Language Acquisition, 41 (4), 721–744.

Crossley, S., Salsbury, T., & McNamara, D. (2009). Measuring L2 lexical growth using hypernymic relationships. Language Learning, 59 (2), 307–334.

(2010). The Development of Polysemy and Frequency Use in English Second Language Speakers: Polysemy and Frequency Use in English L2 Speakers. Language Learning, 60 (3), 573–605.

David, A. (2008). Vocabulary breadth in French L2 learners. The Language Learning Journal, 36 (2), 167–180.

Davies, M. (2010). The Corpus of Contemporary American English as the first reliable monitor corpus of English. Literary and Linguistic Computing, 25 (4), 447–464.

Devlin, J., Chang, M. -W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv:1810.04805 [Cs]. [URL]

Došilović, F. K., Brčić, M., & Hlupić, N. (2018). Explainable artificial intelligence: A survey. 2018 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 0210–0215.

Ellis, N. C. (2002). Frequency effects in language processing: A Review with Implications for Theories of Implicit and Explicit Language Acquisition. Studies in Second Language Acquisition, 24 (2), 143–188.

Fellbaum, C. (1998). WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.

Garner, J., & Crossley, S. (2018). A Latent Curve Model Approach to Studying L2 N-Gram Development. The Modern Language Journal, 102 (3), 494–511.

Garner, J., Crossley, S., & Kyle, K. (2018). Beginning and intermediate L2 writer’s use of N-grams: An association measures study. International Review of Applied Linguistics in Language Teaching, 58 (1), 51–74.

Goldberg, Y. (2019). Assessing BERT’s Syntactic Abilities (arXiv:1901.05287). arXiv.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36 (2), 193–202.

Grant, L., & Ginther, A. (2000). Using Computer-Tagged Linguistic Features to Describe L2 Writing Differences. Journal of Second Language Writing, 9 (2), 123–145.

Gunning, D., & Aha, D. (2019). DARPA’s Explainable Artificial Intelligence (XAI) Program. AI Magazine, 40 (2), 44–58.

Gunning, D., Stefik, M., Choi, J., Miller, T., Stumpf, S., & Yang, G. -Z. (2019). XAI-Explainable artificial intelligence. Science Robotics, 4 (37), eaay7120.

Hashimoto, B. J., & Egbert, J. (2019). More Than Frequency? Exploring Predictors of Word Difficulty for Second Language Learners. Language Learning, 69 (4), 839–872.

Huang, Y., Murakami, A., Alexopoulou, T., & Korhonen, A. (2018). Dependency parsing of learner English. International Journal of Corpus Linguistics, 23(1), 28–54.

Ishikawa, S. (2013). The ICNALE and sophisticated contrastive interlanguage analysis of Asian learners of English. Learner Corpus Studies in Asia and the World, 1 1, 91–118.

Ke, Z., & Ng, V. (2019). Automated Essay Scoring: A Survey of the State of the Art. 6300–6308.

Kerz, E., Wiechmann, D., Qiao, Y., Tseng, E., & Ströbel, M. (2021). Automated Classification of Written Proficiency Levels on the CEFR-Scale through Complexity Contours and RNNs. Proceedings of the 16th Workshop on Innovative Use of NLP for Building Educational Applications, 199–209. [URL]

Kohavi, R., & John, G. H. (1995). Automatic Parameter Selection by Minimizing Estimated Error. In A. Prieditis & S. Russell. (Eds.), Machine Learning Proceedings 1995 (pp. 304–312). Morgan Kaufmann.

Koizumi, R., & In’nami, Y. (2013). Vocabulary Knowledge and Speaking Proficiency among Second Language Learners from Novice to Intermediate Levels. Journal of Language Teaching and Research, 4 (5), 900–913.

Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of Statistical Software, 28 1, 1–26.

Kuperman, V., Stadthagen-Gonzalez, H., & Brysbaert, M. (2012). Age-of-acquisition ratings for 30,000 English words. Behavior Research Methods, 44 (4), 978–990.

Kyle, K., & Crossley, S. (2016). The relationship between lexical sophistication and independent and source-based writing. Journal of Second Language Writing, 34 1, 12–24.

Kyle, K., & Crossley, S. A. (2015). Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application. TESOL Quarterly, 49 (4), 757–786.

Kyle, K., Crossley, S., & Berger, C. (2018). The Tool for the Automatic Analysis of Lexical Sophistication (TAALES): Version 2.0. Behavior Research Methods, 50 (3), 1030–1046.

Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104 (2), 211–240.

Landauer, T. K., McNamara, D. S., Dennis, S., & Kintsch, W. (Eds.). (2007). Handbook of Latent Semantic Analysis. Psychology Press.

Lau, J. H., & Baldwin, T. (2016). An Empirical Evaluation of doc2vec with Practical Insights into Document Embedding Generation. Proceedings of the 1st Workshop on Representation Learning for NLP, 78–86.

Laufer, B., & Nation, P. (1995). Vocabulary Size and Use: Lexical Richness in L2 Written Production. Applied Linguistics, 16 (3), 307–322.

Le, Q. V., & Mikolov, T. (2014). Distributed Representations of Sentences and Documents. ArXiv:1405.4053 [Cs]. [URL]

Lemhöfer, K., Dijkstra, T., Schriefers, H., Baayen, R. H., Grainger, J., & Zwitserlood, P. (2008). Native language influences on word recognition in a second language: A megastudy. Journal of Experimental Psychology. Learning, Memory, and Cognition, 34 (1), 12–31.

Lu, Xiaofei. (2012). The relationship of lexical richness to the quality of ESL learners’ oral narratives. The Modern Language Journal, 96(2), 190–208.

Lu, X., & Hu, R. (2021). Sense-aware lexical sophistication indices and their relationship to second language writing quality. Behavior Research Methods.

McDonald, S. A., & Shillcock, R. C. (2001). Rethinking the Word Frequency Effect: The Neglected Role of Distributional Information in Lexical Processing. Language and Speech, 44 (3), 295–322.

Meara, P. (1996). The dimensions of lexical competence. Performance and Competence in Second Language Acquisition, 35 1, 33–55.

(2005a). Designing vocabulary tests for English. The Dynamics of Language Use: Functional and Contrastive Perspectives, 140 1, 271.

(2005b). Lexical frequency profiles: A Monte Carlo analysis. Applied Linguistics, 26 (1), 32–47.

(2010). The relationship between L2 vocabulary knowledge and L2 vocabulary use. The Continuum Companion to Second Language Acquisition, 179–193.

Meurers, D. (2012). Natural Language Processing and Language Learning. In The Encyclopedia of Applied Linguistics. John Wiley & Sons, Ltd.

(2021). Natural Language Processing and Language Learning. In The Encyclopedia of Applied Linguistics. John Wiley & Sons, Ltd.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. ArXiv:1301.3781 [Cs].

Milton, J. (2009). Measuring Second Language Vocabulary Acquisition. In Measuring Second Language Vocabulary Acquisition. Multilingual Matters.

Moghadam, S. H., Zainal, Z., & Ghaderpour, M. (2012). A review on the important role of vocabulary knowledge in reading comprehension performance. Procedia-Social and Behavioral Sciences, 661, 555–563.

Monteiro, K. R., Crossley, S. A., & Kyle, K. (2020). In Search of New Benchmarks: Using L2 Lexical Frequency and Contextual Diversity Indices to Assess Second Language Writing. Applied Linguistics, 41(2), 280–300.

Morris, L., & Cobb, T. (2004). Vocabulary profiles as predictors of the academic performance of Teaching English as a Second Language trainees. System, 32 (1), 75–87.

Mostafa, T., Crossley, S., & Kim, Y. (2021). Predictors of English as second language learners’ oral proficiency development in a classroom context. International Journal of Applied Linguistics, 31 (3), 526–548.

Nagy, W. E., & Scott, J. A. (2000). Vocabulary processes. In M. L. Kamil, P. Mosenthal, P. D. Pearson, & R. Barr. (Eds.), Handbook of reading research (Vol. 31, pp. 269–284). Mahwah, NJ: Earlbaum.

Nation, P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31 (7), 9–13.

Nelson, D. L., McEvoy, C. L., & Schreiber, T. A. (2004). The University of South Florida free association, rhyme, and word fragment norms. Behavior Research Methods, Instruments, & Computers, 36 (3), 402–407.

Ortega, L. (2016). Multi-competence in second language acquisition: inroads into the mainstream? In V. Cook & L. Wei. (Eds) The Cambridge Handbook of Linguistic Multi-competence. Cambridge University Press.

Paetzold, G., & Specia, L. (2016). Collecting and Exploring Everyday Language for Predicting Psycholinguistic Properties of Words. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 1669–1679. [URL]

R Core Team (2022). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. [URL]

Read, J. (1998). Validating a Test to Measure Depth of Vocabulary Knowledge. In Validation in Language Assessment. Routledge.

Řehůřek, R., & Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, 45–50.

Saito, K. (2020). Multi- or Single-Word Units? The Role of Collocation Use in Comprehensible and Contextually Appropriate Second Language Speech. Language Learning, 70 (2), 548–588.

Sun, K., & Lu, X. (2021). Assessing Lexical Psychological Properties in Second Language Production: A Dynamic Semantic Similarity Approach. Frontiers in Psychology, 12 1, 672243.

Sundqvist, P. (2019). Commercial-off-the-shelf games in the digital wild and L2 learner vocabulary. Language Learning, 23 (1), 27.

Vanderbilt, Katia, “Developing and Testing Alternative Benchmarks of Lexical Sophistication: L2 Lexical Frequency, Semantic Context, and Word Recognition Indices.” Dissertation, Georgia State University, 2020.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. ArXiv:1706.03762 [Cs]. [URL]

Webb, S. (2008). Receptive and productive vocabulary sizes of L2 learners. Studies in Second Language Acquisition, 30 (1), 79–95.

(2009). The Effects of Receptive and Productive Learning of Word Pairs on Vocabulary Knowledge. RELC Journal, 40 (3), 360–376.

Wilson, M. (1988). MRC psycholinguistic database: Machine-usable dictionary, version 2.00. Behavior Research Methods, Instruments, & Computers, 20 (1), 6–10.

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., … Rush, A. (2020). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45.

Zaytseva, V., Miralpeix, I., & Pérez-Vidal, C. (2019). Because words matter: Investigating vocabulary development across contexts and modalities. Language Teaching Research, 136216881985297.

Zhang, H., Chen, M., & Li, X. (2021). Developmental Features of Lexical Richness in English Writings by Chinese Beginner Learners. Frontiers in Psychology.

Zhu, J., Liapis, A., Risi, S., Bidarra, R., & Youngblood, G. M. (2018). Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation. 2018 IEEE Conference on Computational Intelligence and Games (CIG), 1–8.

Cited by (1)

Cited by one other publication

Huang, Jingxiu, Xiaomin Wu, Jing Wen, Chenhan Huang, Mingrui Luo, Lixiang Liu & Yunxiang Zheng

2023. Evaluating Familiarity Ratings of Domain Concepts with Interpretable Machine Learning: A Comparative Study. Applied Sciences 13:23 ► pp. 12818 ff.

This list is based on CrossRef data as of 5 july 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.