Subcategorization frame identification for learner English
As large-scale learner corpora become increasingly available, it is vital that natural language processing (NLP)
technology is developed to provide rich linguistic annotations necessary for second language (L2) research. We present a system for
automatically analyzing subcategorization frames (SCFs) for learner English. SCFs link lexis with morphosyntax, shedding light on the
interplay between lexical and structural information in learner language. Meanwhile, SCFs are crucial to the study of a wide range of
phenomena including individual verbs, verb classes and varying syntactic structures. To illustrate the usefulness of our system for learner
corpus research and second language acquisition (SLA), we investigate how L2 learners diversify their use of SCFs in text and how this
diversity changes with L2 proficiency.
Article outline
- 1.Introduction
- 2.Subcategorization frames and their automatic identification
- 3.A SCF identification system for learner English
- 3.1Data
- 3.2Method
- 3.3Training and evaluation
- 3.3.1Accuracy
- 3.3.2Error analysis
- i.Distinction between arguments and adjuncts
- ii.Prepositional attachment
- 4.Case study: SCF diversity and L2 proficiency
- 4.1Design of SCF diversity metrics
- 4.1.1Basic design
- i.Repetition
- ii.Evenness
- iii.Dispersion
- iv.Disparity
- 4.1.2Control for text length
- 4.2Data selection and statistical analysis method
- 4.3Results
- 5.Conclusion
- Notes
-
References