Corpus-based researchers and traditional qualitative researchers, such as those interested in critical discourse analysis, are often required to select prototypical texts for close reading that include the language features of interest that are present in a much larger corpus. Traditional approaches to this selection procedure have been largely ad hoc. In this paper, we offer a more principled way of selecting texts for close reading based on a ranking of texts in terms of the number of keywords they contain. To facilitate this analysis, we have developed a multiplatform, freeware software tool called ProtAnt that analyses the texts, generates a ranked list of keywords based on statistical significance and effect size, and then orders the texts by the number of keywords in them. We describe various experiments that demonstrate the ProtAnt analysis is effective not only at identifying prototypical texts, but also identifying outlier texts that may need to be removed from a target corpus.
Anthony, L. (2014). AntConc (Version 3.4.3) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved from [URL] (last accessed May 2015).
Anthony, L., & Baker, P. (2015). ProtAnt (Version 1.0) [Computer Software]. Tokyo, Japan: Waseda University. Retrieved from [URL] (last accessed May 2015).
Bahrololoum, A., Nezamabadi-pour, H., Bahrololoum, H., & Saeed, M. (2012). A prototype classifier based on gravitational search algorithm. Applied Soft Computing, 12(2), 819–825.
Baker, P., Gabrielatos, C., & McEnery. T. (2013). Discourse Analysis and Media Attitudes: The Representation of Islam in the British Press. Cambridge, UK: Cambridge University Press.
Caldas-Coulthard, C.R., & van Leeuwen, T. (2013). Teddy bear stories. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 35–60). Los Angeles, CA: Sage. (Original work published 2003).
Chen, L., Guo, G., & Wang, K. (2011). Class-dependent projection based method for text categorization. Pattern Recognition Letters, 32(10), 1493–1501.
Chouliaraki, L. (2013). Political discourse in the news: Democratizing responsibility or aestheticizing politics? In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 97–118). Los Angeles, CA: Sage. (Original work published 2000).
Damerau, F.J. (1993). Generating and evaluating domain-oriented multi-word terms from texts. Information Processing and Management, 29(4), 433–447.
Durfee, A., Visa, A., Vanharanta, H., Schneberger, S., & Back, B. (2007). Mining text with the Prototype-matching method. Information Resources Management Journal, 20(3), 19–31.
Ehrlich, S.Z., & Blum-Kulka, S. (2013). Peer talk as a ‘double opportunity space’: The case of argumentative discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 145–168). Los Angeles, CA: Sage. (Original work published 2010).
Gabrielatos, C., & Baker, P. (2008). Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press (1996-2005). Journal of English Linguistics, 36(1), 5–38.
Gavriely-Nuri, D. (2013). If both opponents “extend hands in peace”, why don’t they meet? Mythic metaphors and cultural codes in the Israeli peace discourse. In R. Wodak, (Ed.). Critical Discourse Analysis Volume II: Methodologies (pp. 169–186). Los Angeles, CA: Sage. (Original work published 2010).
Kloptchenko, A., Back, B., Visa, A., Toivonen, J., & Vanharanta, H. (2002). Toward content based retrieval from scientific text corpora. In
Proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS), Divnomorskoe, Russia
, 5-10 September 2002 (pp. 444–449). Washington, DC, USA: IEEE Computer Society.
Kloptchenko, A., Magnusson, C., Back, B., Visa, A., & Vanharanta, H. (2004). Mining textual contents of financial reports. The International Journal of Digital Accounting Research, 4(7), 1–29.
Labov, W. (1973). The boundaries of words and their meanings. In J. Fishman (Ed.), New Ways of Analyzing Variation in English (pp. 340–73). Washington, DC: Georgetown University Press.
Leńko-Szymańska, A. (2006). The curse and blessing of mobile phones: A corpus-based study into American and Polish rhetorical conventions. In A. Wilson, D. Archer & P. Rayson (Eds.), Corpus Linguistics around the World (pp. 141–151). London, UK: Rodopi.
Machin, D., & Suleman, U. (2013). Arab and American computer war games: The influence of a global technology on discourse. In R. Wodak, (Ed.), Critical Discourse Analysis Volume II: Methodologies (pp. 229–252). Los Angeles, CA: Sage. (Original work published 2006)
Manning, C.D., Raghavan, P., & Schutze, H. (2008). An Introduction to Information Retrieval. Cambridge, UK: Cambridge University Press.
Rosch, E. (1975). Cognitive representations of semantic categories. Journal of Experimental Psychology: General, 104(3), 192–233.
Sajid, F. (2013). Critical discourse analysis of news headline about Imran Khan’s peace march towards Wazaristan. Journal of Humanities and Social Science, 7(3), 18–24.
Scott, M. (2014). WordSmith Tools (Version 6) [Computer Software]. Liverpool, UK: Lexical Analysis Software. Retrieved from [URL] (last accessed May 2015).
van Leeuwen, T. (1996). The representation of social actors. In C.R. Caldas Coulthard & M. Coulthard (Eds.), Texts and Practices (pp. 32–70). London, UK: Routledge.
Visa, A., Toivonen, J., Vanharanta, H., & Back, B. (2001). Prototype matching: Finding meaning in the books of the bible. In
Proceedings of the 34th Annual Hawaii International Conference on System Sciences (HICSS-34), Hawaii, USA, 3-6 January 2001 (pp. 3002). Washington, DC, USA: IEEE Computer Society.
Wodak, R. (2013). Critical Discourse Analysis. Los Angeles, CA: Sage.
Cited by (24)
Cited by 24 other publications
Bednarek, Monika, Martin Schweinberger & Kelvin K. H. Lee
2024. Corpus-based discourse analysis: from meta-reflection to accountability. Corpus Linguistics and Linguistic Theory 20:3 ► pp. 539 ff.
Chen, Ruina, Zhuojun Zhong, Xinyu Yuan & Haitao Liu
2024. Two sides of the same coin? Cross-linguistic sentiment comparison and thematic discovery of reader’s reception of Wolf Totem
. Digital Scholarship in the Humanities
Hanks, Elizabeth, Brett Hashimoto & Jesse Egbert
2024. The contracts word list: Integral vocabulary for reading and writing English contracts. English for Specific Purposes 75 ► pp. 37 ff.
2023. Reacting to Black Lives Matter: The discursive construction of racism in UK newspapers. Politics 43:3 ► pp. 298 ff.
Irschara, Karoline
2023. Using a Corpus-Assisted Discourse Studies Approach to Analyse Gender: A Case Study of German Radiology Reports. Gender a výzkum / Gender and Research 23:2 ► pp. 114 ff.
Watanabe, Hideo
2023. The discursive construction of a conflict: a case of disputed islands in the East China Sea. Text & Talk 43:3 ► pp. 333 ff.
2022. Book Review. Applied Corpus Linguistics 2:3 ► pp. 100034 ff.
Mockler, Nicole & Elizabeth Redpath
2022. Shoring Up “Teacher Quality”: Media Discourses of Teacher Education in the United Kingdom, United States, and Australia. In The Palgrave Handbook of Teacher Education Research, ► pp. 1 ff.
Mockler, Nicole & Elizabeth Redpath
2023. Shoring Up “Teacher Quality”: Media Discourses of Teacher Education in the United Kingdom, United States, and Australia. In The Palgrave Handbook of Teacher Education Research, ► pp. 933 ff.
Tang, Chris
2022. ‘Amber Alert’ or ‘Heatwave Warning’: The Role of Linguistic Framing in Mediating Understandings of Early Warning Messages about Heatwaves and Cold Spells. Applied Linguistics 43:2 ► pp. 227 ff.
Zhang, Weiyu & Yin Ling Cheung
2022. The Hierarchy of News Values – A Corpus-Based Diachronic and Cross-Cultural Comparison of News Reporting on Epidemics. Journalism Studies 23:3 ► pp. 281 ff.
Hocking, Darryl
2021. Artist’s statements, ‘how to guides’ and the conceptualisation of creative practice. English for Specific Purposes 62 ► pp. 103 ff.
Lienen, Carmen Sarah & J. Christopher Cohrs
2021. Redefining the Meaning of Negative History in Times of Sociopolitical Change: A Social Creativity Approach. Political Psychology 42:6 ► pp. 941 ff.
Pollak, Calvin
2021. Legitimation and Textual Evidence: How the Snowden Leaks Reshaped the ACLU’s Online Writing About NSA Surveillance. Written Communication 38:3 ► pp. 380 ff.
Egbert, Jesse, Tove Larsson & Douglas Biber
2020. Doing Linguistics with a Corpus,
Kania, Ursula
2020. Marriage for all (‘Ehe fuer alle’)?! A corpus-assisted discourse analysis of the marriage equality debate in Germany. Critical Discourse Studies 17:2 ► pp. 138 ff.
Mockler, Nicole
2020. Discourses of teacher quality in the Australian print media 2014–2017: a corpus-assisted analysis. Discourse: Studies in the Cultural Politics of Education 41:6 ► pp. 854 ff.
Mockler, Nicole
2024.
Accounting for teachers: changing representations of education in
The Australian Financial Review
1993–2022
. Educational Review► pp. 1 ff.
Wang, Feng (Robin) & Philippe Humblé
2020. Readers’ perceptions of Anthony Yu’s self-retranslation ofThe Journey to the West. Perspectives 28:5 ► pp. 756 ff.
2018. Politicization of the refugee crisis?: a content analysis of parliamentary debates in Italy, the UK, and the EU. Italian Political Science Review/Rivista Italiana di Scienza Politica 48:1 ► pp. 85 ff.
Turner, Georgina, Sara Mills, Isabelle van der Bom, Laura Coffey-Glover, Laura L Paterson & Lucy Jones
2018. Opposition as victimhood in newspaper debates about same-sex marriage. Discourse & Society 29:2 ► pp. 180 ff.
This list is based on CrossRef data as of 6 january 2025. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.