Do registers have different functions for text length?
A case study of Reddit
Similar to lexical and grammatical choices, the length of a text is also guided by situational constraints and functional needs. Consequently, texts of different lengths are associated with different communicative functions. This study explores the role of register in the functions which are associated with comment lengths on the social media platform Reddit. Since registers differ in their functional and situational makeup, the same text length may also have different functions in different registers. By analyzing variation in the frequencies of register features across comment lengths in a number of popular subreddits in a large-scale dataset of Reddit comments, I show that the functional associations of text length can differ greatly between subreddits, and that comments of the same length can even have virtually opposite functions in different subreddits. Furthermore, some subregisters are clearly differentiated not only by their feature makeup but also by the length of their comments.
Article outline
- 1.Introduction
- 2.Data and Methods
- 2.1Data
- 3.2Methods
- 2.2.1Text-linguistic register framework
- 2.2.2Lengthwise analysis
- 3.Analysis
- 4.Discussion
- 5.Conclusion
- Notes
-
References
References (32)
References
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M., & Blackburn, J. (2020). The Pushshift Reddit Dataset. Proceedings of the International AAAI Conference on Web and Social Media,
14
(1), 830–839.
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D. (1994). An analytical framework for register studies. In D. Biber & E. Finegan (Eds.), Sociolinguistic perspectives on register (pp. 31–56). New York: Oxford University Press.
Biber, D., & Conrad, S. (2001). Introduction: Multi-dimensional analysis and the study of register variation. In S. Conrad & D. Biber (Eds.), Variation in English: Multi-dimensional studies (pp. 3–12). Harlow: Pearson Education.
Biber, D., & Conrad, S. (2009). Register, genre, and style. Cambridge: Cambridge University Press.
Biber, D., Csomay, E., Jones, J. K., & Keck, C. (2004). A corpus linguistic investigation of vocabulary-based discourse units in university registers. In U. Connor & T. A. Upton (Eds.), Applied Corpus Linguistics: A Multidimensional Perspective (pp. 53–72). Rodopi.
Biber, D., & Egbert, J. (2016). Register variation on the searchable web: A multi-dimensional analysis. Journal of English Linguistics,
44
(2), 95–137.
Biber, D., & Egbert, J. (2018). Register variation online. Cambridge: Cambridge University Press.
Biber, D., Egbert, J., & Davies, M. (2015). Exploring the composition of the searchable web: A corpus-based taxonomy of web registers. Corpora,
10
(1), 11–45.
Biber, D., Egbert, J., & Keller, D. (2020). Reconceptualizing register in a continuous situational space. Corpus Linguistics and Linguistic Theory,
16
(3), 581–616.
Biber, D., & Gray, B. (2013). Being specific about historical change: The influence of sub-register. The Journal of English Linguistics,
41
1, 104–134.
Biber, D., & Kurjian, J. (2007). Towards a taxonomy of web registers and text types: A multi-dimensional analysis. In M. Hundt, N. Nesselhauf, & C. Biewer (Eds.), Corpus linguistics and the web (pp. 109–132). Amsterdam: Rodopi.
Clarke, I., & Grieve, J. (2017). Dimensions of abusive language on Twitter. In Z. Waseem, W. Hui Kyong, D. Hovy, & J. Tetreault (Eds.), Proceedings of the first workshop on abusive language online (pp. 1–10). Vancouver: Association for Computational Linguistics.
Clarke, I., & Grieve, J. (2019). Stylistic variation on the Donald Trump Twitter account: A linguistic analysis of tweets posted between 2009 and 2018. PLoS ONE,
14
(9).
Conrad, S., & Biber, D. (Eds.). (2001). Variation in English: Multi-dimensional studies. Harlow: Pearson Education.
Covington, M. A., & McFall, J. D. (2010). Cutting the Gordian knot: The moving-average type-token ratio (MATTR). Journal of Quantitative Linguistics,
17
(2), 94–100.
Egbert, J., Biber, D., & Davies, M. (2015). Developing a bottom-up, user-based method of web register classification. Journal of the Association for Information Science and Technology,
66
(9), 1817–1831.
Friginal, E. (Ed.) (2013). Twenty-five ears of Biber’s multi-dimensional analysis [Special issue]. Corpora,
8
(2).
Grice, P. (1975). Logic and conversation. In P. Cole & J. L. Morgan (Eds.), Speech acts (pp. 41–58). New York: Academic press.
Grieve, J., Biber, D., Friginal, E., & Nekrasova, T. (2011). Variation among blog text types: A multi-dimensional analysis. In A. Mehler, S. Sharoff, & M. Santini (Eds.), Genres on the web: Corpus studies and computational models (pp. 302–322). New York: Springer.
Hess, C. W., Haug, H. T., & Landry, R. G. (1989). The reliability of type-token ratios for the oral language of school age children. Journal of Speech and Hearing Research,
32
1, 536–540.
Hess, C. W., Sefton, K. M., & Landry, R. G. (1986). Sample size and type-token ratios for oral language of preschool children. Journal of Speech and Hearing Research,
29
1, 129–134.
Koizumi, R., & In’nami, Y. (2012). Effects of text length on lexical diversity measures: Using short texts with less than 200 tokens. System,
40
(4), 554–564.
Kubát, M., & Milička, J. (2013). Vocabulary richness measure in genres. Journal of Quantitative Linguistics,
20
(4), 339–349.
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S. J., & McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations (pp. 55–60).
Shi, Y., & Lei, L. (2020). Lexical richness and text length: An entropy-based perspective. Journal of Quantitative Linguistics,
29
(1), 62–79.
Titak, A., & Roberson, A. (2013). Dimensions of web registers: An exploratory multi-dimensional comparison. Corpora,
8
(2), 239–271.
Cited by (3)
Cited by three other publications
Erten-Johansson, Selcen, Valtteri Skantsi, Sampo Pyysalo & Veronika Laippala
This list is based on CrossRef data as of 26 december 2024. Please note that it may not be complete. Sources presented here have been supplied by the respective publishers.
Any errors therein should be reported to them.