The Kansas Developmental Learner corpus (KANDEL)
A developmental corpus of learner German
This article presents the Kansas Developmental Learner corpus (KANDEL), a corpus of L2 German writing samples produced by several cohorts of North American university students over four semesters of instructed language study. This corpus expands the number of freely and publicly available learner corpora while adding to the depth of these corpora with a unique set of features. It does so by focusing on an L2 other than English, German, targeting beginning to intermediate L2 proficiency levels, and including dense developmental data and annotations for multiple linguistic variables, learner errors, and over twenty learner and task variables. Furthermore, this article reports the procedure and results of an inter-annotator agreement study as well as an in-depth analysis of annotator disagreement. In this way, it contributes to best practices of annotating learner corpora by making the annotation process transparent and demonstrating its reliability.
References (39)
Aarts, J. & Granger, S. 1998. “Tag sequences in learner corpora: A key to interlanguage grammar and discourse”. In S. Granger (Ed.), Learner English on Computer. New York: Longman, 132–141.
Brants, T. 2000. “Inter-Annotator agreement for a German newspaper corpus”. Proceedings of the
Second International Conference on Language Resources and Evaluation
. Athens, Greece: ELRA. Available at: [URL] (accessed 4 March 2016).
Byrnes, H., Maxim, H. & Norris, J.M. 2010. “Realizing advanced foreign language writing development in collegiate education: Curricular design, pedagogy, assessment [Monograph]”. Modern Language Journal 941(S1).
Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Strasbourg: Language Policy Unit. Available at: [URL] (accessed 4 March 2016).
Granger, S., Gilquin, G. & Meunier, F. 2015. “Introduction: Learner corpus research – past, present and future”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 1–5.
Granger, S. & Thewissen, J. 2007.
Computer-aided Error Analysis
. Lecture presented at the
Summer School Learner Corpus Research: From corpus design to data interpretation
. University of Louvain/Belgium, 9–14 September 2007.
Gries, S.T. 2015. “Statistics for learner corpus research”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 159–181.
Jarvis, S. & Pavlenko, A. 2008. Crosslinguistic Influence in Language and Cognition. New York: Routledge.
Krummes, C. & Ensslin, A. 2014. “What’s hard in German? WHiG: A British learner corpus of German”, Corpora 9(2), 191–205.
Larsen-Freeman, D. 2006. “The emergence of complexity, fluency, and accuracy in the oral and written production of five Chinese learners of English”, Applied Linguistics 271, 590–619.
Lüdeling, A. 2008. “Mehrdeutigkeiten und Kategorisierung: Probleme bei der Annotation von Lernerkorpora”. In M. Walter & P. Grommes (Eds.), Fortgeschrittene Lernervarietäten: Korpuslinguistik und Zweitspracherwerbsforschung. Tübingen: Max Niemeyer Verlag, 119–140.
Lüdeling, A. & Hirschmann, H. 2015. “Error annotation systems”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), The Cambridge Handbook of Learner Corpus Research. Cambridge: Cambridge University Press, 135–157.
Lüdeling, A., Walter, M., Kroymann, E. & Adolphs, P. 2005. “Multi-level error annotation in learner corpora”, Proceedings of
Corpus Linguistics 2005
, Birmingham, UK. Available at: [URL] (accessed 4 March 2016).
Mackey, A. & Gass, S. 2005. Second Language Research: Methodology and Design. New York, NY: Routledge.
Meunier, F. & Littré, D. 2013. “Tracking learners’ progress: Adopting a dual corpus cum experimental data approach”, Modern Language Journal 97(S1), 61–76.
Meurers, D. 2011. On automatically analyzing learner language. Keynote lecture presented at Learner Corpus Research 2011, Université Catholique de Louvain, Louvain-la-Neuve, Belgium, 15-17 September 2011. Available at: [URL] (accessed 4 March 2016).
Ortega, L. & Byrnes, H. 2008. “Theorizing advancedness, setting up the longitudinal research agenda”. In L. Ortega & H. Byrnes (Eds.), The Longitudinal Study of Advanced L2 Capacities. New York, NY: Routledge/Taylor & Francis, 281–300.
Ortega, L. & Sinicrope, C. 2008. Novice Proficiency in a Foreign Language: A Study of Task-based Performance Profiling on the STAMP Test. (Technical report). University of Oregon, Center for Applied Second Language Studies.
Reznicek, M., Lüdeling, A., Krummes, C., Schwantuschke, F., Walter, M., Schmidt, K., Hirschmann, H. & Andreas, T. 2012. Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 2.01. Available at: [URL] (accessed 4 March 2016).
Reznicek, M., Walter, M., Schmidt, K., Lüdeling, A., Hirschmann, H., Krummes, C. & Andreas, T. 2010. Das Falko-Handbuch: Korpusaufbau und Annotationen, Version 1.0.1. Available at: [URL] (accessed 4 March 2016).
Schiller, A., Teufel, S., Stöckert, C. & Thielen, C. 1999. Guidelines für das Tagging deutscher Textcorpora mit STTS [Guidelines for tagging German corpora of written language with STTS]. Technical Report. Stuttgart, Germany: Institut für maschinelle Sprachverarbeitung [Institute for Machine Language Processing].
Schmid, H. 1994. “Probabilistic part-of-speech tagging using decision trees”, Proceedings of the
International Conference on New Methods in Language Processing
. Manchester, UK, 44–49. Available at: [URL] (accessed 4 March 2016).
Schmidt, T. 2011. “A TEI-based approach to standardising spoken language transcription”, Journal of the Text Encoding Initiative 11. Available at: [URL] (accessed 4 March 2016).
Vyatkina, N. 2012. “The development of second language writing complexity in groups and individuals: A longitudinal learner corpus study”, Modern Language Journal 96(4), 576–598.
Vyatkina, N. 2013a. “Analyzing part-of-speech variability in a longitudinal learner corpus and a pedagogic corpus”. In S. Granger, G. Gilquin, & F. Meunier (Eds.), Twenty Years of Learner Corpus Research: Looking Back, Moving Ahead. Corpora and Language in Use - Proceedings 1. Louvain-la-Neuve: Presses universitaires de Louvain, 479–491.
Vyatkina, N. 2013b. “Specific syntactic complexity: Developmental profiling of individuals based on an annotated learner corpus”, Modern Language Journal 97(s1), 11–30.
Vyatkina, N. 2016. “Data-driven learning for beginners: The case of German verb-preposition collocations”, ReCALL 28(2), 207–226.
Vyatkina, N., Hirschmann, H. & Golcher, F. 2015. “Syntactic modification at early stages of L2 German writing development: A longitudinal learner corpus study”, Journal of Second Language Writing 291, 28–50.
Wisniewski, K., Schöne, K., Nicolas, L., Vettori, C., Boyd, A., Meurers, D., Abel, A. & Hana, J. 2013. “MERLIN: An online trilingual learner corpus empirically grounding the European Reference Levels in authentic learner data”. In
ICT for Language Learning, Conference Proceedings 2013
. Libreriauniversitaria.it Edizioni. Available at: [URL] (accessed 4 March 2016).
Cited by (3)
Cited by three other publications
Spina, Stefania, Irene Fioravanti, Luciana Forti & Fabio Zanda
2024.
The CELI corpus: Design and linguistic annotation of a new online learner corpus.
Second Language Research 40:2
► pp. 457 ff.
Larsson, Tove, Magali Paquot & Luke Plonsky
This list is based on CrossRef data as of 5 august 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.