In this exercise, you will create an exercise. Your task is simply to take any topic from one of the weeks of the course and use the data provided below to develop an exercise for other students in the class. The goals are twofold:

To help you learn in depth one of the topics from the course that is of interest to you
To provide students in the class with a learning resource in preparation for the exam

You might, for example, ask those taking your exercise to apply a topic model to tweets from members of the current US Congress and graph it over time; or apply the Barberá ideology model to follower data from Danish politicians to see whether the model works in Denmark; or see if the text of tweets predicts the voting ideology of members of congress. You are also free to do something simpler, like asking someone to apply a series of regular expression searches to tweets to identify the ones that contain specific phrases, like in the first text analysis lab.

Keep in mind that the goal of your exercise should be to teach others how and why the steps in a type of social media data analysis are done, and how to understand the output. If you create an exercise around regular expressions, for example, you would want to ask a series of questions in a way that encourages the exercise-taker to learn different types of search patterns (e.g. $, ^, \\s, .*, \\w, \\b+, etc.). You’re, of course, going to have to know the material itself well enough that you can teach it through creating an exercise.

The entirety of your exercise should be in a single .R file that other students can download and complete. In addition to the exercise .R file, also provide the answer key, which would simply be the same .R file as the exercise, but with all of the correct code filled in.

As an example, I have coded up the exercise version of the topic models lab (regarding Democrats’ and Republicans’ mentions of COVID). You don’t have to follow the format that I use, but it should give you some sense of how an exercise might look.

If your exercise contains code that takes a long time to run, you can also include the saved R objects as part of the exercise, as I have done for the example below.

Please do not make the exercise overly complex. The goal is just to provide a straightforward exercise for other students so that they have a tool for their exam preparation to better understand one of the topics covered in this class.

Example

Goal: In this exercise, you will fit a topic model to data from the tweets from members of congress in 2019-2020 to see if you can discover tweets about the COVID-19 pandemic. You will then graph those data for Republicans and Democrats to see whether members of the Republican Party or Democratic Party began talking about the pandemic first.

Topic_Models_Exercise_Example.R
Topic_Model_Exercise_Example_Answer_Key.R

Tweets from US Members of Congress can be found here: MOC_Tweets.rds

Tokenized data object (the output of one of the steps so you can avoid waiting for 15+ minutes): Tokens.rds

100-topic LDA model object (the output of one of the steps so you can avoid waiting for over an hour): model_lda_100.rds

Data for your exercise

Below are four datasets that you can use for your own exercise. If you want to use different data, please let me know. Note that the tweets datasets are data from the most recent 3,200+ tweets from politicians at the time of collection.

2019-2020 tweets from US Members of Congress: MOC_Tweets.rds

These data have the tweets from members of congress from roughly 2019 to early 2020, which we have been using throughout the course so far
A variable for the party (Republican, Democrat, Independent) of each member of congress is merged in (called “affiliation”)

2021 tweets from US Members of Congress: MOC2021_Tweets.rds

These data have the tweets from members of congress 2020-2021
There is a party variable called “affiliation”
There is also a variable “nominate_score” which is the voting ideology of each member of congress
- This might be useful for a supervised learning exercise, or to look at other analyses and how they relate to the ideology of the politician

2021 followers of Danish politicians: Follower files and List of politicians

2021 tweets from Danish politicians: DK2021_Tweets.rds and List of politicians

“user_id”: the user ID of the politician (can be matched to the data in DK_Politicians.csv
“date”: date the tweet was sent
“tweet_id”: a unique ID for the tweet itself
“tweet_type”: whether the tweet was authored by the politician himself or herself, is a retweet that the politician sent, or is a quote tweet
“text”: the text of the tweet
“text_user_id” is the user_id of the user that is being retweeted or quote-tweeted