Chapter 2
What is missing in learner corpus design?
This chapter discusses what is missing in learner corpus design. Learner corpus researchers are sometimes not fully aware of the basic principles of corpus design and collection that most corpus linguists should know. I will first discuss theoretical and methodological issues related to learner corpus design and collection, focusing on sampling, representativeness, and corpus size. Then, I will review three relevant studies (Biber 1993; Tomasello & Stahl 2004; Mukherjee & Rohrbach 2006) in order to better understand corpus design issues such as parameters of corpus sampling, effects of sample size, and variations in learner corpus design. Finally, the chapter concludes by discussing critical assessment and future directions in terms of issues of design as well as data collection in learner corpus research.
Article outline
- 1.Introduction
- 2.What learner corpus researchers should know before using or creating corpora
- 2.1Basic concepts of corpus design and collection
-
2.1.1Machine-readability
- 2.1.2Authenticity
- 2.1.3Sampling
- 2.1.4Representativeness
- 2.2Pitfalls in designing learner corpora
- 2.2.1Target population
- 2.2.2Data collection methods
- 2.2.3Subcorpus design
- 3.Meeting criteria in learner corpus design
- 3.1Sampling which reflects a full range of variability
- 3.2Effects of sample size
-
3.3Possible variations in learner corpus design
- 4.Critical assessment and future directions
- 4.1Issues of balance and representativeness
- 4.2Data collection issues
-
References
References (22)
References
Atkins, B.T.S., Clear, J. & Ostler, N. 1991. Corpus design criteria. Literary and Linguistic Computing 7 (1): 1–16.
Biber, D. 1993. Representativeness in corpus design. Literary and Linguistic Computing 8 (4): 243–257.
Biber, D. & Reppen, R. 1998. Corpus Linguistics. Cambridge: CUP.
Bley-Vroman, R. 1983. The comparative fallacy in interlanguage studies: The case of systematicity. Language Learning 33 (1): 1–17.
Chaudron, C. 2008. Data collection in SLA research. In The Handbook of Second Language Acquisition, C.J. Doughty & M.H. Long, 762–828. Oxford: Blackwell.
Ellis, R. 1994. The Study of Second Language Acquisition. Oxford: OUP.
Granger, S. 1994. The learner corpus: A revolution in applied linguistics. English Today 39 3 (3): 25–29.
Granger, S. 1996. Learner English around the world. In Comparing English World-Wide, S. Greenbaum (ed.), 13–24. Oxford: Clarendon Press.
Granger, S. 1998. The computerized learner corpus: a versatile new source of data for SLA research. In Learner English on Computer, S. Granger (ed.), 13–18. London: Addison Wesley Longman.
Granger, S. 2003. The International Corpus of Learner English: A new resource for foreign language learning and teaching and second language acquisition research. TESOL Quarterly 37 (3): 538–546.
Johansson, S., Leech, G. & Goodluck, H. 1978. The Lancaster-Oslo/Bergen Corpus of British English, for Use with Digital Computers. Olso: Department of English, University of Oslo. (Abbreviated as LOB.).
McEnery, T. & Wilson, A. 2001. Corpus Linguistics: An Introduction. Edinburgh: EUP.
McEnery, T., Xiao, R. & Tono, Y. 2006. Corpus-based Language Studies: An Advanced Resource Book. London: Routledge.
Mukherjee, J. & Rohrbach, J.-M. 2006. Rethinking applied corpus linguistics from a language-pedagogical perspective: New departures in learner corpus research. In Planning, Gluing and Painting Corpora: Inside the Applied Corpus Linguist’s Workshop, B. Kettemann & G. Marko (eds), 205–232. Frankfurt: Peter Lang.
Sinclair, J. 2005. Corpus and text – Basic principles. In Developing Linguistic Corpora: a Guide to Good Practice, M. Wynn (ed.), 1–16. Oxford: Oxbow Books. <[URL]> (25 May 2013).
Sinclair, J. 2008. Borrowed ideas. In Language, People, Numbers: Corpus Linguistics and Society, A. Gerbig and O. Mason (eds.), pp.21–41. Amsterdam: Rodopi.
Tomasello, M. & Stahl, D. 2004. Sampling children’s spontaneous speech: How much is enough? Journal of Child Language 31: 101–121.
Widdowson, H. 1998. Context, community and authentic language. TESOL Quarterly 32 (4): 705–16.
Widdowson, H. 2000. On the limitations of linguistics applied. Applied Linguistics 21: 3–25.
Cited by (2)
Cited by two other publications
Lozano, Cristóbal
2022.
CEDEL2: Design, compilation and web interface of an online corpus for L2 Spanish acquisition research.
Second Language Research 38:4
► pp. 965 ff.
Lozano, Cristóbal & Paloma Fernández-Mira
2022.
Designing, compiling and interrogating corpora in L2 Spanish acquisition research.
Journal of Spanish Language Teaching 9:2
► pp. 190 ff.
This list is based on CrossRef data as of 23 september 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.