This week will have a somewhat shorter lecture, and then a lab as usual. You will be introduced to term frequency inverse document frequency (TF-IDF), Receiver Operator Characteristic (ROC) curves, and keywords in context.


Presentations

Lennart is presenting:
Fake News on Twitter during the 2016 U.S. Presidential Election.
Nir Grinberg, Kenneth Joseph, Lisa Friedland, Briony Swire-Thompson, and David Lazer

Victor is presenting:
Who Leads? Who Follows? Measuring Issue Attention and Agenda Setting by Legislators and the Mass Public Using Social Media Data.
Pablo Barberá, Andreu Casas, Jonathan Nagler, Patrick J. Egan, Richard Bonneau, John T. Jost, and Joshua A. Tucker


Readings

No required readings this week.

Below are a couple of articles that you might use as references to understand the material the we cover in the class nevertheless.

  1. An Introduction to ROC Analysis
    Pattern Recognition Letters, 2006, 2: 861-874
    Tom Fawcett

  2. An Introduction to Statistical Learning (with Applications in R)
    New York, NY: Springer, 2013.
    Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani
    On ROC Curves, pp. 147-149

  3. Introduction to Information Retrieval
    New York, NY: Cambridge University Press, 2009.
    Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze
    On TF-IDF, pp. 117-120

Finally, as an extra, the article below provides a nice example of how to use both supervised and unsupervised learning techniques to answer important questions in the study of social media and politics. The authors manually annotate tweets to classify them as civil or uncivil (polite/impolite); apply a supervised learning model (lasso) to then classify all tweets that they collected; and finally apply an (unsupervised) LDA topic model to see what topics have the most uncivil posts. The authors do so to examine the level of incivility directed at politicians, and how incivility differs depending on the political topic of discussion. A similar paper would work well as a thesis topic for masters students in the class if you are still searching for an idea. From what I can tell, their analysis is also wholly conducted in R, and uses ggplot for graphing (as with many papers in political science).

  1. The Dynamics of Political Incivility on Twitter
    SAGE OPEN, 2020: 1-15
    Yannis Theocharis, Pablo Barberá, Zoltán Fazekas, and Sebastian Adrian Popa

Lectures

I will mention the articles below:

  1. From Isolation to Radicalization: Anti-Muslim Hostility and Support for ISIS in the West
    American Political Science Review, 2019, 113 (1): 173-194
    Tamar Mitts

  2. Gendered Language on the Economics Job Market Rumors Forum
    American Economic Association: Papers & Proceedings, 2018, 108 (May): 175-179
    Alice H. Wu

  3. Classification Accuracy as a Substantive Quantity of Interest: Measuring Polarization in Westminster Systems
    Political Analysis, 2018, 26 (1): 120-128
    Andrew Peterson and Arthur Spirling

  4. Elusive Consensus: Polarization in Elite Communication on the COVID-19 Pandemic
    Science Advances, 2020, 6 (28): 1-5
    Jon Green, Jared Edgerton, Daniel Naftel, Kelsey Shoub, and Skyler J. Cranmer
  5. How State and Protester Violence Affect Protest Dynamics
    Journal of Politics, Forthcoming: 1-39
    Zachary C. Steinert-Threlkeld, Alexander Chan, and Jungseock Joo

  6. Viral Visualizations: How Coronavirus Skeptics Use Orthodox Data Practices to Promote Unorthodox Science Online
    CHI Conference on Human Factors in Computing Systems (CHI ‘21), May 8-13, Yokohama, Japan, 2021: 1-18
    Crystal Lee, Tanya Yang, Gabrielle Inchoco, Graham M. Jones, and Arvind Satyanarayan

Lab

The .R file and model objects below have slight differences (and improvements) to what is described in the video. So you may notice relatively minor changes between the R file downloadable below and the R file as discussed in the video.

Lab code: TF-IDF.R

Tweets from Members of Congress: MOC_Tweets.rds

Tokenized data: TFIDF_Tokens.rds

Fitted models: elastic_model_counts.rds
Fitted models: elastic_model_counts_auc.rds
Fitted models: elastic_model_tfidf.rds
Fitted models: elastic_model_auc_tfidf.rds