Automatic analysis of caregiver input and child production
Insight into corpus-based research on child language development in Korean
The present study explores the applicability of Natural Language Processing (NLP) techniques to investigate child
corpora in Korean. We employ caregiver input and child production data in the CHILDES database, currently the largest and
open-access Korean child corpus data, and apply NLP techniques to the data in two ways: automatic Part-of-Speech tagging by
adapting a machine learning algorithm, and (semi-)automatic extraction of constructional patterns expressing a transitive event
(active transitive and suffixal passive). As the first empirical report on NLP-assisted analysis of Korean child corpora, this
study is expected to reveal its advantages and drawbacks, thereby opening the window to furthering corpus-mediated research on
child language development in Korean. Implications of this study’s findings will also contribute to research practice regarding
developmental studies on Korean through child corpora, ensuring the reproducibility of procedures and results, which is often
lacking in previous corpus-based research on child language development in Korean.
Article outline
- 1.Introduction
- 2.Research on child corpora in Korean
- 3.Towards automatic processing of child corpora: POS tagging
- 3.1Issues with POS tagging in Korean
- 3.2Developing a POS tagger for Korean child corpora
- 3.2.1Pre-processing
- 3.2.2Machine learning algorithm for POS tagging: Perceptron
- 3.2.3Model performance
- 3.3Results and discussion
- 4.Towards automatic processing of child corpora: Construction identification
- 4.1Challenges in automatic processing of active transitives and suffixal passives in Korean
- 4.2Construction identification: Caregiver input
- 4.3Construction identification: Child production
- 4.4Model performance
- 4.5Results and discussion
- 4.5.1Accuracy of pattern-finder
- 4.5.2Use of active transitives and suffixal passives: caregiver input
- use
- use
- of findings: Caregiver input
- 4.5.3Use of active transitives and suffixal passives: Child production
- 5.Conclusion: Implications on automatic processing of Korean child corpora for developmental research on Korean
- Notes
- Abbreviations
References (57)
