This article reviews corpus-based Chinese studies, both applied and theoretical, from the 1920s to the present. It will be shown that, while corpus-based Chinese studies have been gaining momentum for only the last couple of decades, the roots of Chinese corpus linguistics go all the way back to the beginning of the 20th century. Today the bulk of corpus-based Chinese studies is oriented toward applied linguistics, with the compilation of frequency character/word lists and interlanguage Chinese studies being the most popular types of research. In addition to applied linguistic studies, this overview also highlights some innovative corpus studies on lexical and grammatical aspects of both classical and modern Chinese, as well as studies of sociolinguistic variation and discourse pragmatics. Overall, important groundwork in Chinese corpus linguistics is acknowledged and future directions are discussed.
Ao, Hongde. 1929a. “Yutiwen Yingyong Zihui Yanjiu Baogao: Chen Heqin Shi Yutiwen Yingyong Zihui zhi Xu [A study of characters used in vernacular Chinese: Extending Chen’s character list].” Jiaoyu Zazhi [Journal of Education] 21 (2): 77–101.
Ao, Hongde. 1929b. “Yutiwen Yingyong Zihui Yanjiu Baogao (Xu): Chen Heqin Shi Yutiwen Yingyong Zihui zhi Xu [A Study of Characters Used in Vernacular Chinese: Extending Chen’s Character List (Continued)].” Jiaoyu Zazhi [Journal of Education] 21 (3): 97–113.
Bei, Guiqin,Xuetao Zhang and . 1988. Hanzi Pindu Tongji [Frequency calculation of Chinese characters]. Beijing: Publishing House of Electronics Industry.
Chen, Heqin. 1922. “Yutiwen Yingyong Zihui [Characters used in vernacular Chinese].” Xin Jiaoyu [New Education] 5 (5): 987–995.
Chen, Heqin. 1928. Yutiwen Yingyong Zihui [Characters used in vernacular Chinese]. Shanghai: The Commercial Press.
Chen, Heqin. 2008. “Yutiwen Yingyong Zihui [Characters used in vernacular Chinese].” In Chen Heqin Quanji (Di Liu Juan) [The complete works of Heqin Chen (Volume 6)], ed. by Xiuyun Chen and Yifei Chen, 55–114. Nanjing: Jiangsu Education Press.
China State Language Commission and China State Bureau of Standards. 1992. Xiandai Hanyu Zipin Tongji Biao [A frequency list of modern Chinese characters]. Beijing: Language and Culture Press.
Chu, Chengzhi, and Xiaohe Chen. 1993. “Jianli Hanyu Zhongjieyu Yuliaoku Xitong de Jiben Shexiang [The initial considerations of creating a Chinese interlanguage corpus system].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 7 (3): 199–205.
Cui, Xiliang. 2005. “Oumei Xuesheng Hanyu Jieci Xide de Tedian ji Pianwu Fenxi [The acquisition of Chinese prepositions by European and American learners and analysis of their errors].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 19 (3): 83–95.
Cui, Xiliang, and Baolin Zhang (eds.). 2013. Dier Jie Hanyu Zhongjieyu Yuliaoku Jianshe yu Yingyong Guoji Xueshu Taolunhui Lunwen Xuanji [Proceedings of the second international symposium on the construction and application of Chinese interlanguage corpora]. Beijing: Beijing Language and Culture University Press.
Eifring, Halvor. 1992. A Concordance to Baiyujing. Oslo: Solum Forlag.
Feng, Shengli. 2002. The Prosodic Syntax of Chinese. Muenchen: Lincom Europa.
Feng, Zhiwei. 2012. Ziran Yuyan Chuli Jianming Jiaocheng [A concise course of natural language processing]. Shanghai: Shang Foreign Language Education Press.
Granger, Sylviane. 1996. “From CA to CIA and Back: An Integrated Approach to Computerized Bilingual and Learner Corpora.” In Languages in Contrast: Text-based cross-linguistic studies, ed. by Karin Aijmer, et al., 37–51. Lund: Lund University Press.
Granger, Sylviane (ed.). 1998. Learner English on Computer. London: Longman.
Granger, Sylviane. 2002. “A Bird’s-eye View of Learner Corpus Research.” In Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching, ed. by Sylviane Granger, et al., 3–33. Amsterdam: John Benjamins Publishing Company.
Hai, Liuwen. 2011. Shisan Jing Zipin Yanjiu [The frequency study of the thirteen Chinese canons]. Beijing: Higher Education Press.
Halliday, Michael. 1959. The Language of the Chinese “Secret History of the Mongols”. Oxford: Basil Blackwell.
Halliday, Michael. 1992. “Language as System and Language as Instance: The Corpus as a Theoretical Construct.” In
Directions in Corpus Linguistics: Proceedings of Nobel symposium 82
, ed. by Jan Svartvik, 61–77. Berlin: Mouton de Gruyter.
Halliday, Michael. 2008. Complementarities in Language. Beijing: The Commercial Press.
Hung, William. 1932. Yinde Shuo [On indexing]. Peking: Harvard-Yenching Institute Sinological Index Series, Peking University Library.
Institute of Language Teaching Research at Beijing Language Institute. 1985a. Hanyu Cihui de Tongji yu Fenxi [The statistics and analysis of Chinese words]. Beijing: Foreign Language Teaching and Research Press.
Institute of Language Teaching Research at Beijing Language Institute. 1985b. Changyong Zi he Changyong Ci [Frequently used characters and words]. Beijing: The Publishing House of Beijing Language Institute.
Institute of Language Teaching Research at Beijing Language Institute. 1988. Xiandai Hanyu Pinlu Cidian [Frequency dictionary of Chinese words]. Beijing: The Publishing House of Beijing Language Institute.
Lau, Din Cheuk, Ho Che Wah, and Chen Fong Ching (eds.). 1992. A Concordance to Shuoyuan No. 1 (ICS Ancient Chinese Texts Concordance Series). Hong Kong: The Commercial Press.
Li, Fanglan. 2011. Xiandai Hanyu Yuyiyun de Lilun Tansuo yu Xide Yanjiu: Yuliaoku Yuyanxue Shijiao [A theoretical exploration into semantic prosody and its acquisition of modern Chinese: A corpus linguistics perspective]. Unpublished PhD thesis. Minzu University of China.
Li, Jinman, and Fuyun Wu. 2013. “Leixingxue Gaikuo yu Eryu Xuexizhe Hanyu Guanxi Congju Chanchu Yanjiu [Typological generalisations and the study on the production of Chinese relative clauses by second language learners].” Waiyu Jiaoxue yu Yanjiu [Foreign language teaching and research] 45 (1): 80–92.
Li, Jinxi. 1922. “Guoyu zhong Jiben Yuci de Tongji Yanjiu [Statistical considerations of basic ocabulary in Chinese].” Guowen Xuehui Congkan [Journal of Chinese language society] 1 (1): 81–84.
Liu, Eric Shen. 1973. Frequency Dictionary of Chinese Words. The Hague: Mouton.
Liu, Yuan, Nanyuan Liang, Dejin Wang, Sheying Zhang, Tieying Yang, Chunyu Jie, and Wei Sun. 1990. Xiandai Hanyu Changyong Ci Cipin Cidian [A dictionary of frequency of modern Chinese words]. Beijing: Astronautic Publishing House.
Liu, Yun. 2009. “Hanyu Cihui Tongji Yanjiu Shuping [A review of Chinese vocabulary statistical studies].” Hanyu Xuexi [Chinese Language Learning] 30 (1): 62–69.
Liu, Zhiji. 2009. “Zipin Shijiao de Gu Wenzi Sishu Fenbu Fazhan Yanjiu [Research on the distribution and development of four categories of character construction in ancient writings from the isual angle of character frequency].” Gu Hanyu Yanjiu [Research in ancient Chinese Language] 22 (4): 2–11.
Lu, Wu, Fuyin Nan, and Shan Chen (eds.). 2000. Yuanchao Mishi Jiaozhu [Collated and annotated secrect history of the Mongols]. Jinan: Qilu Publishing House.
Matthews, Peter. 1991. Morphology (2nd Edition). Cambridge: Cambridge University Press.
McCarthy, John, and Alan Prince. 1995. “Prosodic Morphology.” In Handbook of Phonology, ed. by John Goldsmith, 318–366. Oxford: Blackwell.
McEnery, Tony, and Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge: Cambridge University Press.
Pan, Shuguang. 1984. Guji Suoyin Gailun [Indexing of Chinese classics: A general introduction]. Beijing: Catalogs and Documentations Publishing House.
Sentence Pattern Research Group at Beijing Language Institute. 1989a. “Xiandai Hanyu Jiben Juxing [Basic sentence patterns of modern Chinese].” Shijie Hanyu Jiaoxue [Chinese teaching in the world] 3 (1): 26–35.
Sentence Pattern Research Group at Beijing Language Institute. 1989b. “Xiandai Hanyu Jiben Juxing (Xuyi) [Basic sentence patterns of modern Chinese (Continued I)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 3 (3): 144–148.
Sentence Pattern Research Group at Beijing Language Institute. 1989c. “Xiandai Hanyu Jiben Juxing (Xuer) [Basic sentence patterns of modern Chinese (Continued II)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 3 (4): 211–219.
Sentence Pattern Research Group at Beijing Language Institute. 1990. “Xiandai Hanyu Jiben Juxing (Xusan) [Basic sentence patterns of modern Chinese (Continued III)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 4 (1): 27–33.
Sentence Pattern Research Group at Beijing Language Institute. 1991. “Xiandai Hanyu Jiben Juxing (Xusi) [Basic sentence patterns of modern Chinese (Continued IV)].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 5 (1): 23–29.
Siewierska, Anna, Jiajin Xu, and Richard Xiao. 2010. “Bang-le Yi Ge Da Mang (Offered a Big Helping Hand): A Corpus Study of the Splittable Compounds in Spoken and Written Chinese.” Language Sciences 32 (4): 464–487.
Sinclair, John. 2004. Trust the Text: Language, Corpus and Discourse. London: Routledge.
Tsai, Ting Kan. 1922. Laojielao [The interpretation of Dao De Jing based on Dao De Jing texts]. Beijing: Self-publication. A synthetic study of LaoTzu’s TaoTeChing in Chinese
Tsou, Benjamin, and Rujie You. 2007. ‘21 Shiji Huayu Xin Ciyu Cidian’ Bianzhu Ganyan [Reflections on compiling ‘The Dictionary of Chinese Neologisms for the 21st Century’
]. Cishu Yanjiu [Lexicographical Studies] 29 (6): 123–128.
Tsou, Benjamin, and Rujie You. 2010. Quanqiu Huayu Xin Ciyu Cidian [An international dictionary of Chinese neologisms]. Beijing: The Commercial Press.
Tsou, Benjamin, Hing-Lung Lin, Terence Chan, Jerome Hu, Ching-hai Chew, and John K.P. Tse. 1997. “A Synchronous Chinese Language Corpus from Different Speech Communities: Construction and Application.” International Journal of Computational Lingusitics and Chinese Language Processing 2 (1): 91–104.
Unihan Digital Technology Co., Ltd. 2008. Guji Hanzi Zipin Tongji [Character frequency calculation of classical Chinese]. Beijing: The Commercial Press.
Wang, Chunxia. 2001. Jiyu Yuliaoku de Lihe Ci Yanjiu [A corpus-based study of splittable sompounds]. M.A. dissertation, Beijing Language and Culture University.
Wang, Fengyang. 1983. Ci de Pinlu he Zi de Fenhua [Word frequency and character differentiation]. Paper presented at the
Second Annual Conference of Chinese Linguistics Society
. Hefei, Anhui, May 1983.
Wang, Haifeng. 2011. Xiandai Hanyu Liheci Lixi Xingshi Gongneng Yanjiu [A functional study of the split forms of splittable compounds in Modern Chinese]. Beijing: Peking University Press.
Xiao, Richard, Paul Rayson, and Tony McEnery. 2009. A Frequency Dictionary of Mandarin Chinese: Core Vocabulary for Learners. London: Routledge.
Xiao, Xiqiang, and Wangxi Zhang (eds.). 2011. Shoujie Hanyu Zhongjieyu Yuliaoku Jianshe yu Yingyong Guoji Xueshu Taolunhui Lunwen Xuanji [Proceedings of the first international symposium on the construction and application of Chinese interlanguage corpora]. Beijing: World Publishing Corporation.
Xiong, Wenxin. 1996. “Liuxuesheng Ba Zi Jiegou de Biaoxian Fenxi [An Analysis of the Performance of Ba Constructions by International Students].” Shijie Hanyu Jiaoxue [Chinese Teaching in the World] 10 (1): 80–87.
Xu, Jiajin. 2009. Qingshaonian Hanyu Kouyu zhong Huayu Biaoji de Huayu Gongneng Yanjiu [The use of discourse markers in spoken Chinese of urban teenagers]. Beijing: Foreign Language Teaching and Research Press.
Yang, Bojun. 1980. Lunyu Yizhu [Annotations to the Analects]. Beijing: Zhonghua Book Company.
Yang, Shiqiao. 2011. Jiyu Yuliaoku de Hanyu Yihuan Huihua Xiuzheng Yanjiu [A corpus based study of repair in Chinese doctor–patient conversations]. Unpublished PhD thesis. Shanghai: Shanghai International Studies University.
Zhang, Pu. 1999a. “Guanyu Daguimo Zhenshi Wenben Yuliaoku de Jidian Lilun Sikao [Some theoretical thoughts about the large-scale corpora of authentic texts].” Yuyan Wenzi Yingyong [Applied Linguistics] 8, 1, 34–43.
Zhang, Pu. 1999b. “Guanyu Yugan yu Liutongdu de Sikao [On Language sense and degree of circulation].” Yuyan Jiaoxue yu Yanjiu [Language Teaching and Linguistic Studies] 21 (2): 83–96.
Zhou, Shengya. 2007. Soushenji Yuyan Yanjiu [A linguistic study of Soushenji]. Beijing: China Renmin University Press.
Zipf, George. 1935. The Psycho-Biology of Language: An Introduction to Dynamic Philology. Boston: Houghton Mifflin Company.
Zou, Shaohua, and Biao Ma. 2007. Qiyi de Qingxiangxing Yanjiu [Studies of preferred interpretations of morpho-syntactic ambiguities]. Beijing: China Social Sciences Press.
Zou, Shaohua. 2001. Yuyong Pinlu Xiaoying Yanjiu [Studies in frequency effects of language use]. Beijing: The Commercial Press.
Cited by (13)
Cited by 13 other publications
Guo, Ziwei
2024. Book review. System 126 ► pp. 103491 ff.
Yu, Guodong, Yaxin Wu, Paul Drew & Chase Wesley Raymond
2024. The development of a Chinese vocabulary proficiency test (CVPT) for learners of Chinese as a second/foreign language . Language Testing 41:2 ► pp. 412 ff.
Zhang, Huiyu, Yayu Shi & Haitao Liu
2024. Evolving means of formal language policy on Putonghua and minority languages on the Chinese mainland (1986–2021). International Journal of Multilingualism 21:4 ► pp. 1821 ff.
Zhang, Huiyu, Hailing Zhang, Yayu Shi & Yueyu Chen
Man Kit Lee, Stephen, Hey Wing Liu & Shelley Xiuli Tong
2023. Identifying Chinese Children with Dyslexia Using Machine Learning with Character Dictation. Scientific Studies of Reading 27:1 ► pp. 82 ff.
Zhang, Huiyu & Yayu Shi
2023. Evolution of English language education policies in the Chinese mainland in the 21st century: A corpus-based analysis of official language policy documents. Linguistics and Education 76 ► pp. 101190 ff.
2021. A corpus-based study of the Chinese synonymous approximativesshangxia, qianhouandzuoyou. Corpus Linguistics and Linguistic Theory 17:2 ► pp. 411 ff.
Jiang, Shang, Xin Jiang & Anna Siyanova-Chanturia
2020. The processing of multiword expressions in children and adults: An eye-tracking study of Chinese. Applied Psycholinguistics 41:4 ► pp. 901 ff.
Chen, Howard Ho-Jan & Hongyin Tao
2019. Academic Chinese: From Corpora to Language Teaching. In Computational and Corpus Approaches to Chinese Language Learning [Chinese Language Learning Sciences, ], ► pp. 57 ff.
Hsu, Chan-Chia
2019. A corpus-based study on the functions of antonym co-occurrences in spoken Chinese
. Text & Talk 39:4 ► pp. 535 ff.
Xu, Jiajin
2019. The Corpus Approach to the Teaching and Learning of Chinese as an L1 and an L2 in Retrospect. In Computational and Corpus Approaches to Chinese Language Learning [Chinese Language Learning Sciences, ], ► pp. 33 ff.
This list is based on CrossRef data as of 14 november 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.