Public policy research applications
of DocuScope’s linguistic
taxonomy
Mining style and stance for sociocultural insight
Computer scientists in natural language
processing (NLP) have focused on the lexical level of language: word
counts, ratios, distance, and context, and this attention to the
lexical level of language is well suited to semantic tasks as well
as syntactic analyses. Corpus linguists on the other hand have had a
broader focus, also accounting for the lexicogrammatical level of
language, and thus their approach is well-suited to pragmatic tasks.
DocuScope, with its linguistic taxonomy at the lexicogrammatical
level, is thus a unique and complementary tool for the data-driven
analysis of large collections of text, addressing the stance and
style choices pervasive in linguistic behavior. This chapter looks
at how DocuScope’s taxonomy has informed a range of problems in
public policy at the RAND Corporation. One section of the chapter
examines how the DocuScope taxonomy has been used as a statistical
tool to find patterns in text corpora, scaling up human qualitative
analysis into a mixed methods text analysis approach, for example
analyzing open text responses in a large survey of U.S. special
forces operators. The second section shows how the DocuScope
taxonomy has improved machine learning efforts, both in terms of
accuracy and interpretability, for example in detecting and
understanding conspiracy theory discourse over social media. This
chapter ultimately calls for humanistic knowledge as a valuable and
necessary complement to technical advances in data-centric
disciplines like NLP.
Article outline
- 1.Introduction
- 2.Overview of DocuScope’s usage at RAND
- 2.1The RAND-Lex instantiation of the DocuScope
dictionaries: Quantifying stance
- 2.1.1Machine + human reading: Scaling up qualitative analysis
- 2.1.2Quantitative representations of stance for machine
learning
- 3.Examples applications of the DocuScope dictionaries in public
policy research
- 3.1Scaling up human reading: Analyzing attitudes in survey responses and measuring
changes in news presentation
- 3.1.1Analyzing attitudes in survey responses from special
operations members
- 3.1.2Measuring style at scale: Has U.S. news reporting become more subjective over
time?
- 3.2Improving machine reading through linguistic stance
- 3.2.1Election interference: Understanding Russian trolls and U.S.
partisanship
- 3.2.2Stance across language: Understanding the Arabic Bin Laden archive
- 3.2.3Hybrid modeling: Improving machine learning performance, and insight
with the DocuScope dictionaries
- 3.2.4Stance’s value is document-length dependent
- 3.2.5Modeling with stance: Improved interpretability
- 4.Filling in NLP gaps through humanistic theory
-
Notes
-
References