Mapping of American English vocabulary by grade levels
We describe a large-scale effort to map English-language vocabulary by U.S. school grade levels. Our motivation is
to rapidly expand graded vocabulary resources for work with native English speakers in the USA, while taking into consideration
school-related influences rather than relying on just the corpus-frequency approaches. We report on the initial effort of data
collection, with mapping of about 22K word forms. We provide comparisons of this mapping to some other recent vocabulary mapping
efforts, such as age-of-acquisition. We then describe the efforts to automatically expand this resource by using linguistically
motivated variables and corpus-based methods. Our current resource maps more than 126K English word forms to US school grade
levels. We also compare a subset of our L1 mapped data to English L2 vocabulary levels, as expressed on the CEFR scale, and find
that there is a considerable overlap in the order of vocabulary learning in L1 and L2 English.
Article outline
- Introduction
- Related work
- Method
- Data Collection
- Comparing VXGL and AoA
- Prediction
- Associative Estimate of Grade Level
- Results
- Comparison with CEFR mapping
- Discussion
- Conclusion
- Notes
-
References