The present study aims to explore the applicability of automatic analysis to L2-Korean learner corpora, with a special
focus on learners’ use of a clause-level construction. For this purpose, we investigate L1-Mandarin L2-Korean learners’ written production
of two passive construction types in Korean – suffixal and periphrastic – by devising a pattern-extraction process through NLP techniques.
We focus on reporting how the passive constructions are identified and extracted from learner writing automatically, given language-specific
features involving the passive. A total of 72 essays were analysed by adapting an existing pipeline (developed by Shin, forthcoming), with enhanced tokenisation and annotation through manual revision of the data. Results showed
that our automatic pattern-finder identified more instances than manual extraction for the suffixal passive and yielded a perfect match with
manual extraction for the periphrastic passive. Implications of the findings are discussed in regard to strengths and drawbacks of the
automatic analysis of learner writing, with suggestions for improving currently available tools for learner corpus research in Korean.
Abbot-Smith, K., Chang, F., Rowland, C., Ferguson, H., & Pine, J. (2017). Do two and three year old children use an incremental first-NP-as-agent bias to process active transitive and passive sentences?: A permutation analysis. PloS one, 12(10), e0186129.
Bang, D.-S. (2014). hankwuke kokup haksupcauy ssukiey nathananun hancae olyu pwunsek [A study of sino-Korean errors found in advanced Korean learners]. kwukhakyenkwulonchong, 141, 1–21.
Bestgen, Y., & Granger, S. (2014). Quantifying the development of phraseological competence in L2 English writing: An automated approach. Journal of Second Language Writing, 261, 28–41.
Biber, D., Conrad, S., & Cortes, V. (2004). If you look at…: Lexical bundles in university teaching and textbooks. Applied Linguistics, 25(3), 371–405.
Biber, D., Gray, B., & Poonpon, K. (2011). Should we use characteristics of conversation to measure grammatical complexity in L2 writing development?. TESOL Quarterly, 45(1), 5–35.
Birjandi, P., Maftoon, P., & Rahemi, J. (2011). VanPatten’s processing instruction: Links to the acquisition of English passive structure by Iranian EFL learners. European Journal of Science Research, 64(4), 598–609.
Brooks, P. J., & Tomasello, M. (1999). Young children learn to produce passives with nonce verbs. Developmental Psychology, 351, 29–44.
Cho, S., & Park, Y. (2018). Sheffield tayhakkyo hankwuke haksupcauy cakmwun thukseng pwunsek [Characteristics of Korean language writing by students at the University of Sheffield]. cakmwunyenkwu, 381, 149–172.
Cho, Y. (2019). The effects of writing prompt types on L2 learners’ writing strategy use and performance. Studies in English Language & Literature, 45(3), 295–314.
Choi, B.-S. (2018). oykwukin yuhaksaynguy kwanhyengcel silhyen yangsang yenkwu – cakmwun calyolul cwungsimulo [A study on the aspects of the Korean adnominal clause of overseas students – focused on using writing]. hanmincokemwunhak, 791, 61–95.
Choi, J. D., & Palmer, M. (2011, October). Statistical dependency parsing in Korean: From corpus generation to automatic parsing. In D. Seddah, R. Tsarfaty, & J. Foster (Eds.), Proceedings of the 2nd Workshop on Statistical Parsing of Morphologically-Rich Languages (pp. 1–11). Stroudsburg: Association for Computational Linguistics.
Choo, M., & Kwak, H.-Y. (2008). Using Korean. Cambridge: Cambridge University Press.
Chun, J., Han, N.-R., Hwang, J. D., & Choi, J. D. (2018). Building Universal Dependency Treebanks in Korean. In N. Calzolari, K. Choukri, C. Cieri, T. Declerck, S. Goggi, K. Hasida, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, S. Piperidis, & T. Tokunaga (Eds.), Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC 2018). European Language Resources Association.
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016). The development and use of cohesive devices in L2 writing and their relations to judgments of essay quality. Journal of Second Language Writing, 321, 1–16.
Cui, Y., & Wang, J. (2018). cwungkwukin haksupcatuluy pocoyongen sayongyangsang kochal – cakmwun pwunsekul thonghan sayongpinto cosa mich olyu pwunsek [Exploring the use of Korean auxiliary verbs among Chinese learners]. Teaching Korean as a Foreign Language, 511, 175–202.
Dąbrowska, E., & Street, J. (2006). Individual differences in language attainment: Comprehension of passive sentences by native and non-native English speakers. Language Sciences, 28(6), 604–615.
de Felice, R., & Pulman, S. (2009). Automatic detection of preposition errors in learner writing. Calico Journal, 26(3), 512–528.
de Haan, P. (2000). Tagging non-native English with the TOSCA–ICLE tagger. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory (pp. 69–79). Amsterdam: Rodopi.
de Mönnink, I. (2000). Parsing a learner corpus. In C. Mair & M. Hundt (Eds.), Corpus linguistics and linguistic theory (pp. 81–90). Amsterdam: Rodopi.
Ellis, N. C., & Ferreira-Junior, F. (2009). Construction learning as a function of frequency, frequency distribution, and function. The Modern Language Journal, 93(3), 370–385.
Gablasova, D., Brezina, V., & McEnery, T. (2017). Collocations in corpus-based language learning research: identifying, comparing, and interpreting the evidence. Language Learning, 67(S1), 155–179.
Gilquin, G. (2008). Hesitation markers among EFL learners: Pragmatic deficiency or difference. In J. Romero-Trillo (Ed.), Pragmatics and corpus linguistics: A mutualistic entente (pp. 119–149). Berlin: Mouton de Gruyter.
Goldberg, A. E. (1995). Constructions: a construction grammar approach to argument structure. Chicago, IL: University of Chicago Press.
Goldberg, A. E. (2006). Constructions at work: The nature of generalization in language. Oxford: Oxford University Press.
Han, C. H., & Palmer, M. (2005). A morphological tagger for Korean: Statistical tagging combined with corpus-based morphological rule application. Machine Translation, 18(4), 275–297.
Hinkel, E. (2004). Tense, aspect and the passive voice in L1 and L2 academic texts. Language Teaching Research, 8(1), 5–29.
Huang, Y. T., Zheng, X., Meng, X., & Snedeker, J. (2013). Children’s assignment of grammatical roles in the online processing of Mandarin passive sentences. Journal of Memory and Language, 69(4), 589–606.
Huh, C.-G. (2018). cwungkwukin haksupcauy kulssukiey nathanan hankwuke eswunuy haksup yangsang [The aspects of learning word order of Korean language of Chinese learners]. tonamemwunhak, 341, 255–290.
Izumi, S., & Lakshmanan, U. (1998). Learnability, negative evidence and the L2 acquisition of the English passive. Second Language Research, 14(1), 62–101.
Jeong, H. (2014). Processing and acquisition of Korean passive voice by Chinese L2 learners. hankwuke kyoyuk [Korean Education], 25(2), 165–186.
Ju, M. K. (2000). Overpassivization errors by second language learners: The effect of conceptualizable agents in discourse. Studies in Second Language Acquisition, 22(1), 85–111.
Kim, H., & Rah, Y. (2016). Effects of verb semantics and proficiency in second language use of constructional knowledge. The Modern Language Journal, 100(3), 716–731.
Kim, H., Shin, G.-H., & Hwang, H. (2020). Integration of verbal and constructional information in the Second Language processing of English dative constructions. Studies in Second Language Acquisition, 42(4), 825–847.
Kim, H.-G., Kang, B.-M., & Hong, J. (2007). 21seyki seycongkyeyhoyk hyentaykwuke kichomalmwungchi sengkwawa cenmang [21st century Sejong modern Korean corpora: Results and expectations]. In Korean Institute of Information Scientists and Engineers (Ed.), Proceedings of Annual Conference on Human and Language Technology 31 (pp. 311–316). Korean Institute of Information Scientists and Engineers.
Kim, J. Y., Park, Y. H., Kim, M. J., Kim, H. N., Choi, S. K., Suh, J. H., & Kwak, Y. J. (2016). hankwuke haksupcauy cakmwun malmwungchilul hwalyonghan mwunhyeng yonglyey kemsaykki kaypal yenkwu [A study of developing usage searcher of grammar pattern in the Korean learner’s writing corpus]. Teaching Korean as a Foreign Language, 441, 131–155.
Kim, S. J., & Kim, S. H. (2013). yeseng kyelhonimincauy kwueey nathanan tamhwaphyoci sayong yangsang yenkwu [A study on the use aspects of discouse markers appeared in spoken Korean language of marriage woman immigrants]. The Journal of Linguistics Science, 641, 25–46.
Kim, Y.-I. (2019). hankwuke kyocayuy ‘-i/hi/li/ki-’ phitong tanwen pwunsekkwa kyoswu pangan ceysi [Analysis and Teaching Methods of the ‘-i/hi/li/gi-’ Passive unit in Korean textbooks]. Journal of Korean Language Education, 30(1), 27–63.
Kim, W., & Ock, C. Y. (2015). hankwuke kyekthul sacenkwa uymiyek pinto cengpolul sayonghan hankwuke uymiyek kyelceng [Korean semantic role labeling using case frame and frequency]. Journal of Korean Institute of Information Technology, 11(2), 161–167.
Kwak, S. J. (2016). mikwuk nay tayhak haksupcatuluy cakmwun pwunsekul thonghan hankwuke swuktalto swucwunpyel pikyo yenkwu – yuthatay salyeyyenkwu [A comparative study of American university students’ Korean proficiency by level through analysis of composition: A case study at the University of Utah]. Teaching Korean as a Foreign Language, 441, 23–51.
Kyle, K. (2016). Measuring syntactic development in L2 writing: Fine grained indices of syntactic complexity and usage-based indices of syntactic sophistication (Unpublished doctoral dissertation). Georgia State University, Atlanta.
Kyle, K., & Crossley, S. (2017). Assessing syntactic sophistication in L2 writing: A usage-based approach. Language Testing, 34(4), 513–535.
Kyle, K., Crossley, S., & Berger, C. (2018). The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0. Behavior Research Methods, 50(3), 1030–1046.
Lee-Ellis, S. (2009). The development and validation of a Korean C-Test using Rasch Analysis. Language Testing, 26(2), 245–274.
Lee, I. (2011). kwukehakkaysel [Introduction to Korean linguistics]. Seoul: Hakyensa.
Lee, J. (2017). tayhaksayng cakmwuney nathanan ehwi tayangseng yenkwu – oykwukin yuhaksayngkwa hankwukinuy pikyolul cwungsimulo [A studyon lexical diversities in writing of university students – Focusing on the comparison of Koreans and foreign students]. tayhakcakmwun, 191, 61–91.
Lee, S.-M. (2017). hankwuke haksupcauy malhakiwa ssukiey nathanan ehwi sayonguy congtancek yenkwu [A longitudinal study of vocabulary usage presented in speaking and writing of Korean learners]. wulimalkul, 741, 183–214.
Lee, S.-H., Dickenson, M., & Israel, R. (2016). Challenges of learner corpus annotation: Focusing on Korean Learner Language Analysis (KoLLA) system. Language Facts and Perspectives, 381, 221–251.
Lee, S. K. (2007). Effects of textual enhancement and topic familiarity on Korean EFL students’ reading comprehension and learning of passive form. Language Learning, 57(1), 87–118.
Lee, S.-A., & Choi, J.-T. (2013). hankwuke Verb_OntoNetuy selkyeywa kwuchwuk [Design and implementation of Korean Verb_OntoNet]. Journal of Korean Institute of Information Technology, 11(2), 161–167.
Li, C., & Thompson, S. (1981). Mandarin Chinese: A functional reference grammar. Berkeley, CA: University of California Press.
Liu, N. (2016). The structures of Chinese long and short bei passives revisited. Language and Linguistics, 17(6), 857–889.
Meurers, D. (2015). Learner corpora and natural language processing. In S. Granger, G. Gilquin, & F. Meunier (Eds.). The Cambridge handbook of learner corpus research (pp. 537–566). Cambridge: Cambridge University Press.
Meurers, D., & Dickinson, M. (2017). Evidence and interpretation in language learning research: Opportunities for collaboration with computational linguistics. Language Learning, 67(S1), 66–95.
Miller, R., Mitchell, T., & Pessoa, S. (2016). Impact of source texts and prompts on students’ genre uptake. Journal of Second Language Writing, 311, 11–24.
Nam, J., Kim, Y., & Kim, Y. (2016). L2 hankwuke mwune sanchwuleyseuy thongsa pokcapseng chukceng [Measuring syntactic complexity in L2 Korean writings]. Korean Semantics, 511, 21–56.
Nam, Y. J., & Hong, U. P. (2014). L2loseuy hankwuke cayenpalhwa khophesuuy kwuchwukkwa hwalyong [Towards a corpus-based approach to Korean as a second language]. The Journal of the Humanities for Unification, 571, 193–220.
Park, E., & Cho, S. (2014). KoNLPy: swipko kankyelhan hankwuke cengpocheli phaissen phaykhici [KoNLPy: Korean natural language processing in Python]. cey26hoy hankul mich hankwuke cengpocheli hakswultayhoy nonmwuncip.
Park, H.-J., & Lee, M.-H. (2017). hankwuke haksupcauy ssuki theyksuthuey nathanan ungkyelsengkwa ungcipsenguy sangkwanpwunsek [Correlation analysis of cohesion and coherence in Korean as a second language student’s writing]. Wulimalkul, 731, 133–157.
Park, J., Hong, J. P., & Cha, J. W. (2016). Korean language resources for everyone. In J. C. Park & J.-W. Chung (Eds.), Proceedings of the 30th Pacific Asia conference on language, information and computation: Oral Papers (pp. 49–58).
Park, Y.-H., & Lee, H.-W. (2014). hankwuke haksupcalul wihan hankwuke mwuncang kwuseng kyoyuk pangan yenkwu – cwungkwukin haksupcauy eswuney ttalun kulssuki olyu pwunsekul thonghaye [A study on effective teaching strategies for Korean language writers through error analysis]. Studies in Linguistics, 331, 159–174.
Park, Y.-K., Kim, J.-M., Lee, S.-D., & Lee, H. A. (2017). oykwukin haksupcalul wihan mwunmayk kipan silsikan kwuke mwuncang kyoceng [Context Based Real-time Korean Writing Correction for Foreigners]. Journal of KIISE, 44(10), 1087–1093.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. and Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 121, 2825–2830.
Petrov, S., Das, D., & McDonald, R. (2012). A universal part-of-speech tagset. In N. Calzolari, K. Choukri, T. Declerck, M. U. Doğan, B. Maegaard, J. Mariani, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the 8th International Conference on Language Resources and Evaluation (pp. 2089–2096). European Language Resources Association.
Qi, P., Dozat, T., Zhang, Y., & Manning, C. D. (2018). Universal dependency parsing from scratch. In D. Zeman & J. Hajič (Eds.), Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (pp. 160–170). Stroudsburg: Association for Computational Linguistics.
Römer, U., Roberson, A., O’Donnell, M. B., & Ellis, N. C. (2014). Linking learner corpus and experimental data in studying second language learners’ knowledge of verb-argument constructions. ICAME Journal, 38(1), 115–135.
Ryu, S. (2017). hankwuke haksupcauy cakmwun calyoey nathanan cepsokpwusa sayong yangsang yenkwu – pinto cengpolul cwungsimulo [A Study on the use of Korean conjunctive adverbs in Korean learners by analyzing their writing – Focusing on frequency information]. mwunpep kyoyuk, 291, 143–168.
Seo, S.-B. (2014). oykwukin yuhaksaynguy hankwuke ssuki olyu pwunsek – hakpwu cayhak yuhaksayng paykilcang cakmwunul taysangulo [A study on analysis of error patterns in Korean writing of international students – Focusing on essay writing contest for university students]. wulimalkul, 621, 127–157.
Siewierska, A. (2013). Alignment of verbal person marking. In M. Haspelmath, M. Dryer, D. Gil, & B. Comrie (Eds.), The world atlas of language structures online. Leipzig: Max Planck Institute for Evolutionary Anthropology. Retrieved at [URL]
Shin, G.-H. (forthcoming). Automatic analysis of caregiver input and child production: Insight into corpus-based research on child language development in Korean. Korean Linguistics.
Shin, G.-H. (2020). Connecting input to comprehension: First language acquisition of active transitives and suffixal passives by Korean-speaking preschool children. (Unpublished doctoral dissertation). University of Hawai‘i at Mānoa, Honolulu.
Sohn, H. M. (1999). The Korean language. Cambridge: Cambridge University Press.
Song, J. J. (2015). Causatives. In L. Brown & J. Yeon (Eds.), The handbook of Korean linguistics (pp. 116–136). Oxford: John Wiley & Sons.
Song, S., & Choe, J. W. (2007). Type hierarchies for passive forms in Korean. In S. Müller (Ed.), Proceedings of the 14th International Conference on Head-Driven Phrase Structure Grammar, Stanford Department of Linguistics and CSLI’s LinGO Lab (pp. 250–270). Stanford, CA: CSLI Publications.
Song, W. (2018). cwungkwukin chokup hankwuke haksupcauy kulssukiey nathanan cosa olyu yangsangkwa cito pangan yenkwu [A Study on the auxiliary word error pattern and guidance method in the writing of Chinese elementary Korean learners]. cakmwunyenkwu, 381, 119–147.
Straka, M., & Straková, J. (2017). Tokenizing, POS Tagging, lemmatizing and parsing UD 2.0 with UDPipe. In J. Hajič & D. Zeman (Eds.), Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (pp. 88–99). Stroudsburg: Association for Computational Linguistics.
Sun, C. F., & Givón, T. (1985). On the so-called SOV word order in Mandarin Chinese: A quantified text study and its implications. Language, 611, 329–351.
Sung, M.-C., & Kim, H. (2020). Effects of verb–construction association on second language constructional generalizations in production and comprehension. Second Language Research.
Xiao, R. (2007). What can SLA learn from contrastive corpus linguistics? The case of passive constructions in Chinese learner English. Indonesian JELT, 3(1), 1–19.
Won, M., Wang, Y., Zhu, Y., & Wang, H. (2017). hankwuke haksupcauy ssukiey nathanan ehwi phwungyoto yenkwu-swuktalto chukceng tokwulosse ehwi phwungyoto chukceng kanungsengul cwungsimulo [A study of lexical richness of Korean learners’ writing: The possibility of using lexical richness to measure language level]. emwunlonchong, 711, 33–55.
Yeon, J. (2015). Passives. In L. Brown & J. Yeon (Eds.), The handbook of Korean linguistics (pp. 116–136). Oxford: John Wiley & Sons.
Zeman, D., Hajič, J., Popel, M., Potthast, M., Straka, M., Ginter, F., Nivre, J., & Petrov, S. (2018). CoNLL 2018 shared task: Multilingual parsing from raw text to Universal Dependencies. In D. Zeman & J. Hajič (Eds.), Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies (pp. 1–21). Stroudsburg: Association for Computational Linguistics.
Cited by (5)
Cited by five other publications
Jung, Boo Kyung & Gyu-Ho Shin
2024. L2 textbook input and L2 written production: a case of Korean locative postposition–verb construction. International Review of Applied Linguistics in Language Teaching 62:2 ► pp. 539 ff.
Shin, Gyu-Ho, Boo Kyung Jung & Seongmin Mun
2024. Transformer-based text similarity and second language proficiency: A case of written production by learners of Korean. Natural Language Processing Journal 6 ► pp. 100060 ff.
Sung, Hakyung, Sooyeon Cho & Kristopher Kyle
2024. An Empirical Evaluation of Lexical Diversity Indices in L2 Korean Writing Assessment. Language Assessment Quarterly 21:2 ► pp. 159 ff.
2023. Isomorphism and language-specific devices in comprehension of Korean suffixal passive construction by Mandarin-speaking learners of Korean. Applied Linguistics Review 14:3 ► pp. 503 ff.
This list is based on CrossRef data as of 19 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.