Part of
Corpora and Rhetorically Informed Text Analysis: The diverse applications of DocuScope
Edited by David West Brown and Danielle Zawodny Wetzel
[Studies in Corpus Linguistics 109] 2023
► pp. 167189
Allison, S., Heuser, R., Jockers, M., Moretti, F., & Witmore, M.
(2011) Quantitative formalism: An experiment. Stanford Literary Lab.Google Scholar
Angelov, D.
(2020) Top2vec: Distributed representations of topics. arXiv:2008.09470.Google Scholar
Arora, S., Ge, R., Halpern, Y., Mimno, D., Moitra, A., Sontag, D., Wu, Y., & Zhu, M.
(2013, May). A practical algorithm for topic modeling with provable guarantees. In Proceedings of the 31st International Conference on Machine Learning (pp. 280–288). PMLR.Google Scholar
Arora, S., Ge, R., Kannan, R., & Moitra, A.
(2016) Computing a nonnegative matrix factorization – Provably. SIAM Journal on Computing, 45(4), 1582–1611. DOI logoGoogle Scholar
Arroyo-Fernández, I., Méndez-Cruz, C. F., Sierra, G., Torres-Moreno, J. M., & Sidorov, G.
(2019) Unsupervised sentence representations as word information series: Revisiting TF–IDF. Computer Speech & Language, 56, 107–129. DOI logoGoogle Scholar
Basu, A., Hope, J., & Witmore, M.
(2017) The professional and linguistic communities of early modern dramatists. In A. W. Johnson, R. D. Sell, & H. Wilcox (Eds.), Community-making in early Stuart theatres: Stage and audience. Routledge.Google Scholar
Bernstein, S. D., & Derose, C.
(2012) Reading numbers by numbers: Digital studies and the Victorian serial novel. Victorian Review, 38(2), 43–68. DOI logoGoogle Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I.
(2003) Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.Google Scholar
Cai, D., He, X., Wu, X., & Han, J.
(2008, December). Non-negative matrix factorization on manifold. In 2008 Eighth IEEE International Conference on Data Mining (pp. 63–72). IEEE. DOI logoGoogle Scholar
Chen, C. H.
(2017) Improved TFIDF in big news retrieval: An empirical study. Pattern Recognition Letters, 93, 113–122. DOI logoGoogle Scholar
Cichocki, A., & Phan, A. H.
(2009) Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences, 92(3), 708–721. DOI logoGoogle Scholar
Correll, M., Witmore, M., & Gleicher, M.
(2011) Exploring collections of tagged text for literary scholarship. Computer Graphics Forum, 30(3), 731–740. DOI logoGoogle Scholar
Danescu-Niculescu-Mizil, C., & Lee, L.
(2011) Chameleons in imagined conversations: A new approach to understanding coordination of linguistic style in dialogs. arXiv:1106.3077.Google Scholar
2022spaCy. Retrieved on 25 January 2023 from [URL]
Févotte, C., & Idier, J.
(2011) Algorithms for nonnegative matrix factorization with the β-divergence. Neural Computation, 23(9), 2421–2456. DOI logoGoogle Scholar
Forsyth, E., Lin, J., & Martell, C.
n.d.). The NPS Chat Corpus [Dataset]. Retrieved on 25 January 2023 from [URL]
Forsyth, E., & Martell, C. H.
(2007) Lexical and discourse analysis of online chat dialog. In Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007) (pp. 19–26). DOI logoGoogle Scholar
Geisler, C., & Swarts, J.
(2019) Coding streams of language: Techniques for the systematic coding of text, talk, and other verbal data. WAC Clearinghouse. DOI logoGoogle Scholar
Goldstone, A., & Underwood, T.
(2014) The quiet transformations of literary studies: What thirteen thousand scholars could tell us. New Literary History, 45(3), pp. 359–384. DOI logoGoogle Scholar
Grabill, J. T., & Pigg, S.
(2012) Messy rhetoric: Identity performance as rhetorical agency in online public forums. Rhetoric Society Quarterly, 42(2), 99–119. DOI logoGoogle Scholar
Grisel, O., Buitink, L., & Yau, C. K.
n.d.). Topic extraction with non-negative matrix factorization and latent dirichlet allocation. [Computer Code]. Retrieved on 25 January 2023 from [URL]
Grootendorst, M.
(2020) BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. DOI logoGoogle Scholar
Havrlant, L., & Kreinovich, V.
(2017) A simple probabilistic explanation of term frequency-inverse document frequency (tf-idf) heuristic (and variations motivated by this explanation). International Journal of General Systems, 46(1), 27–36. DOI logoGoogle Scholar
He, R., & McAuley, J.
(2016) Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering. arXiv:1602.01585. DOI logoGoogle Scholar
Hoffman, M., Bach, F. R., & Blei, D. M.
(2010) Online learning for latent dirichlet allocation. In M. I. Jordan, Y. LeCun, & S. A. Solla (Eds.), Advances in neural information processing systems (pp. 856–864). The MIT Press.Google Scholar
Hope, J., & Witmore, M.
(2004) The very large textual object: A prosthetic reading of Shakespeare. Early Modern Literary Studies, 9(3), 1–36.Google Scholar
(2010) The hundredth psalm to the tune of” Green Sleeves”: Digital approaches to Shakespeare’s language of genre. Shakespeare Quarterly, 61(3), 357–390. DOI logoGoogle Scholar
Hoyer, P. O.
(2004) Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5, 1457–1469.Google Scholar
Jockers, M.
(2013) Macroanalysis: Digital methods and literary history. University of Illinois Press. DOI logoGoogle Scholar
Jockers, Matthew
n.d.). 500 Themes from a corpus of 19th-Century fiction. Retrieved 17 April 2019 from [URL]
Johnson, C., & Marcellino, W.
(2022) Bag-of-words algorithms can supplement transformer sequence classification & improve model interpretability. RAND Corporation. Retrieved on 25 January 2023 from [URL]
Kane, M. S.
(2020, October). Communicating the “write” values: Developing methods of computer-aided text analysis for instructor training. In Proceedings of the 38th ACM International Conference on Design of Communication (pp. 1–8). ACM. DOI logoGoogle Scholar
Kaufer, D. S., & Butler, B. S.
(2010) Rhetoric and the arts of design. Routledge.Google Scholar
Kaufer, D. S., & Ishizaki, S.
(1998) DocuScope: Computer-aided rhetorical analysis [Software].Google Scholar
Kaufer, D., & Ishizaki, S.
(2006) A corpus study of canned letters: Mining the latent rhetorical proficiencies marketed to writers-in-a-hurry and non-writers. IEEE Transactions on Professional Communication, 49(3), 254–266. DOI logoGoogle Scholar
Kaufer, D. S., Ishizaki, S., Butler, B. S., & Collins, J.
(2004) The power of words: Unveiling the speaker and writer’s hidden craft. Routledge. DOI logoGoogle Scholar
Beigman Klebanov, B. B., Kaufer, D., Yeoh, P., Ishizaki, S., & Holtzman, S.
(2016) Argumentative writing in assessment and instruction: A comparative perspective. Genre in Language, Discourse and Cognition, 33, 167. DOI logoGoogle Scholar
Kuang, D., Choo, J., & Park, H.
(2015) Nonnegative matrix factorization for interactive topic modeling and document clustering. In Partitional clustering algorithms (pp. 215–243). Springer. DOI logoGoogle Scholar
Lauer, C., Brumberger, E., & Beveridge, A.
(2018) Hand collecting and coding versus data-driven methods in technical and professional communication research. IEEE Transactions on Professional Communication, 61(4), 389–408. DOI logoGoogle Scholar
Le, Q., & Mikolov, T.
(2014, June). Distributed representations of sentences and documents. In Proceedings of the 31st International Conference on Machine Learning (pp. 1188–1196). PMLR.Google Scholar
Lee, D. D., & Seung, H. S.
(1999) Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), 788–791. DOI logoGoogle Scholar
Marcellino, W.
(2019) Seniority in writing studies: A corpus analysis. Journal of Writing Analytics, 3(1), 183–205. DOI logoGoogle Scholar
McAuley, J., Targett, C., Shi, J., & van den Hengel, A.
(2015) Image-based recommendations on styles and substitutes. SIGIR. DOI logoGoogle Scholar
Ni, J., Li, J., & McAuley, J.
(2019, November). Justifying recommendations using distantly-labeled reviews and fine-grained aspects. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) (pp. 188–197). DOI logoGoogle Scholar
Omizo, R. M.
(2020) Machining Topoi: Tracking premising in online discussion forums with automated rhetorical move analysis. Computers and Composition, 57, 102578. DOI logoGoogle Scholar
Paatero, P.
(1997) Least squares formulation of robust non-negative factor analysis. Chemometrics and Intelligent Laboratory Systems, 37(1), 23–35. DOI logoGoogle Scholar
Paatero, P., & Tapper, U.
(1994) Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Environmetrics, 5(2), 111–126. DOI logoGoogle Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, E.
(2011) Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.Google Scholar
Piper, A.
(2018) Enumerations: Data and literary study. The University of Chicago Press. DOI logoGoogle Scholar
Řehůřek, R., & Sojka, P.
(2011) Gensim – Statistical semantics in python. Retrieved on 25 January 2023 from [URL]
Reimers, N., & Gurevych, I.
(2019) Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv:1908.10084. DOI logoGoogle Scholar
Seung, D., & Lee, L.
(2001) Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems, 13, 556–562.Google Scholar
Steyvers, M., & Griffiths, T.
(2007) Probabilistic topic models. In T. K. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis (427(7), pp. 424–440). Lawrence Erlbaum Associates.Google Scholar
Vanderplas, J. T.
(2016) Python data science handbook: Essential tools for working with data. O’Reilly.Google Scholar
Wang, Y. X., & Zhang, Y. J.
(2012) Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336–1353. DOI logoGoogle Scholar
Wetzel, D., Brown, D., Werner, N., Ishizaki, S., & Kaufer, D.
(2021) Computer-assisted rhetorical analysis: Instructional design and formative assessment using DocuScope. The Journal of Writing Analytics, 5, 292–323. DOI logoGoogle Scholar
Xu, W., Liu, X., & Gong, Y.
(2003, July). Document clustering based on non-negative matrix factorization. In Proceedings of the 26th annual international ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 267–273). DOI logoGoogle Scholar
Zhu, J., Wickes, E., & Gallagher, J. R.
(2021) A machine learning algorithm for sorting online comments via topic modeling. Communication Design Quarterly, 9(2), 4–14. DOI logoGoogle Scholar