Computer corpora in English language research: A critical survey

Collins, Peter

doi:10.1075/aral.10.1.01col

Article published In:

Australian Review of Applied Linguistics
Vol. 10:1 (1987) ► pp.1–19

Computer corpora in English language research

A critical survey

Peter Collins | University of New South Wales

This paper provides an overview of various English language corpora. It examines the relationships between the various extrant corpora and also indicates some of the features of a corpus of written English being developed in Australia. The article considers some of the linguistic and theoretical constraints on corpus-based research.

Published online: 1 January 1987

https://doi.org/10.1075/aral.10.1.01col

References

Aarts, J. and W. Meijs

(1984) Corpus linguistics: recent developments in the use of computer corpora in English language research. Amsterdam, Rodopi.

(eds.) (1986) Corpus linguistics II: new studies in the analysis and exploitation of computer corpora. Amsterdam, Rodopi.Aijmer, K. (1987) Oh and ah in English conversation. In Meijs (ed.) (1987): 61–86.

Altenberg, B.

(1987) Prosodic patterns in spoken English: studies in the correlation between prosody and grammar for text-to-speech conversation. Lund Studies in English 76. Lund, Lund University Press.

Atwell, E.

(1983) Constituent likelihood grammar. ICAME News 71:34–67. Norwegian Computing Centre for Humanities.

Atwell, E., G. Leech and R. Garside

(1984) Analysis of the LOB Corpus: progress and prospects. In Aarts and Meijs (1984): 41–52.

Biber, D.

(1985) Investigating macroscopic textual variation through multi-feature/multi-dimensional analyses. Linguistics 32,2:337–60.

forthcoming) Spoken and written textual dimensions in English: Resolving the contradictory findings. Language 621:384–414.

Briscoe, T., I. Craig and C. Clover

(1987) The use of the LOB Corpus in the development of a phrase structure grammar of English. In Meijs (1987): 207–218.

Coates, J.

(1983) The semantics of modal auxiliaries. London and Canberra, Croom Helm.

Collins, P.C.

(1985) Th-clefts and all-clefts. Beiträge zur Phonetik und Linguistik 41:45–53.

(1987) Cleft and pseudo-cleft constructions in English spoken and written discourse. ICAME Journal 111:5–17.

Collins, P.C. and P. Peters

(forthcoming) The Australian Corpus Project. In Ihalainen, O., M. Kytö and M. Rissanen (eds.) Proceedings of the Eighth International Conference on English Language Research on Computerized Corpora. Amsterdam, Rodopi (to appear).

Eeg-Olofsson, M. and J. Svartvik

(1984) Four-level tagging of spoken English. In Aarts and Meijs (1984): 53–64.

Ellegärd, A.

(1978) The syntactic structure of English texts: a computer based study of four kinds of text in the Brown University Corpus. (Gothenburg Studies in English, 43), Gothenburg University.

Fjelkestan-Nilsson, B.

(1983) ALSO and TOO: a corpus-based study of their frequency and use in Modern English. Stockholm, Stockholm Studies in English, LVIII.

Francis, W.N.

(1980) A tagged corpus – problems and prospects. In S. Greenbaum, G. Leech and J. Svartvik (eds.) Studies in English linguistics: for Randolph Quirk. London, Longman: 192–209.

(1982) Problems of assembling and computerizing large corpora. In Johansson (1982): 7–24.

Francis, W.N. and H. Kučera

(1964) Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Providence, R.I., Department of Linguistics, Brown University.

(1982) Frequency analysis of English usage: lexicon and grammar. Boston, Houghton Mifflin.

Garside, R. and G.N. Leech

(1982) Grammatical tagging of the LOB Corpus: general survey. In Johansson (1982): 110–117.

Geens, D.

(1975/6) Analysis of present-day English theatrical language 1966-72. Leuven, K.U.

Greenbaum, S. and R. Quirk

(1970) Elicitatlon experiments in English: linguistic studies in use and attitude. London, Longman.

Greene, B.B. and G.M. Rubin

(1971) Automatic grammatical tagging of English. Providence, R.I., Department of Linguistics, Brown University.

Hofland, K. and S. Johansson

(1982) Word frequencies in British and American English. Bergen, Norwegian Computing Centre for the Humanities.

Ihalainen, O., M. Kytö and M. Rissanen

(1987) The Helsinki Corpus of English Texts: diachronic and dialectal report on work in progress. In Meijs (1987): 21–32.

Johansson, S.

(ed.) (1982) Computer corpora in English language research. Bergen, Norwegian Computing Centre for the Humanities.

Johansson, S., G. Leech and H. Goodluck

(1978) Manual of information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital computers. Oslo, Department of English, University of Oslo.

Johansson, S. and M.C. Jahr

(1982) Grammatical tagging of the LOB: predicting word class from word endings. In Johansson (1982): 118–146.

Johansson, S. and E.H. Norheim

(1988) The subjunctive in British and American English. ICAME Journal 121:56–57.

Johansson, S. and K. Hofland

forthcoming) Frequency analysis of English vocabulary and grammar.

Kaye, G.

(1988) The design of the database for the Survey of English Usage. ICAME Journal 121:56–57.

Kjellmer, G.

(1986) ‘The lesser man’: Observations on the role of women in modern English writings. In Aarts and Meijs (1986): 163–176.

Leech, G., R. Garslde and E. Atwell

(1983a) The automatic grammatical tagging of the LOB Corpus. ICAME News 71:13–33.

Leech, G. R. Garside and E. Atwell

(1983b) Recent developments in the use of computer corpora in English Language research. Transactions of the Philological Society: 23–40.

Leech, G. and A. Beale

(1985) Computers in English language research. Language Teaching 17,3:216–29.

Marshall, I.

(1938) Choice of grammatical word-class without global syntactic analysis: tagging words in the LOB Corpus. Computers and the Humanities 17,3:139–50.

Martin, J.R.

(1984) Language, register and genre. In F. Christie (ed.) Language studies: children writing. Geelong, Victoria, Deakin University Press: 21–30.

Meijs, W.

(ed.) (1987) Corpus linguistics and beyond. Amsterdam, Rodopi.

Oddy, R.N., S.E. Robertson, C.J. van Rigsbergen and P.W. Williams

(eds.) (1981) Information retrieval research. London, Butterworths.

Oostdijk, N.

(1988) A corpus for studying linguistic variation. ICAME Journal 121:3–14.

Peters, P.

(1987) Towards a corpus of Australian English. ICAME Journal 111:27–38.

Quirk, R. and J. Svarvik

(1966) Investigating linguistic acceptability. The Hague, Mouton.

Sampson, G.

(1987) Evidence against the ‘grammatical/ungrammatical’ distinction. In Meijs (1987): 219–226.

Shastri, S.V.

(1980) A computer corpus of present-day Indian English. ICAME News 41:9–12.

(1985) Word frequencies in Indian English: a preliminary report. ICAME News 91:38–44.

(1988) The Kolhapur Corpus of Indian English and work done on its basis so far. ICAME Journal 121:15–26.

Sinclair, J.McH.

(1982) Reflections on computer corpora in English language research. InJohansson (1982): 1–6.

Svartvik, J.

(1984) Text Segmentation for Speech (TESS): presentation of a project. Survey of Spoken English, Lund University.

Svartvik, J., M. Eeg-Olofsson, O. Forsheden, B. Orestrom and C. Thavenius

(eds.) (1982) A Survey of Spoken English: report on research 1975-81. Lund, Gleerup.

Svartvik, J. and M. Eeg-Olofsson

(1982) Tagging the London-Lund Corpus of Spoken English. In Johansson (1982): 85–109.

Svartvik J. and R. Quirk

(eds.) (1980) A corpus of English conversation. Lund, Gleerup/Liber.

Thavenius, C.

(1982) Exophora in English conversation. In N.E. Enkvist (ed.) (1982) Impromptu speech: a symposium. Åbo, Åbo Akademi: 291–305.

Tottie, G., B. Altenberg and L. Hermeràn

(1983) English in speech and writing. ETOS Report 1. Lund and Uppsala: the Departments of English and the Universities of Lund and Uppsala.

Cited by

Cited by 1 other publications

Altenberg, Bengt

1991. A bibliography of publications relating to English computer corpora. In English Computer Corpora, ► pp. 355 ff.

This list is based on CrossRef data as of 23 april 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers. Any errors therein should be reported to them.