- Extracted billions of tweets and replies from Twitter API using SQL, Python, and web scraping techniques
- Transformed, cleansed, and normalized with Python, Pandas, and regular expressions
- Optimized data collection and transformation tasks using parallel processing, indexing, and caching
- Automated the data collection process to streamline data management processes for multiple collaborators
- Ensured data reliability and pipeline stability by developing logging and alerting mechanism to handle errors
- Collaborated with multidisciplinary team, providing insights and recommendations based on key findings
- Conducted exploratory data analysis using data visualization tools such as ggplot2 and plotly
- Identified experiment-breaking distribution error which necessitated reissuance
- Implemented data transformation techniques including variable recoding, data aggregation, and normalization
- Presented research findings in poster (displayed below) with visualizations, summary and Q & A at the Annual Conference for Political Methodology
- Formulated a compelling hypothesis on motivated reasoning and logical argument evaluation in political science
- Designed 2 large-n survey experiments generating a robust and insightful data set
- Secured ethics approval, upholding the highest research standards
- Conducted advanced data analysis with R packages, revealing key insights on argument evaluation and objectivity interventions
- Can individuals distinguish between strong (logically consistent) and weak (logically flawed) arguments?
- Are evaluations of argument quality biased by individuals’ pre-existing beliefs? competing goal of objectivity?
- Designed and implemented a beginner-friendly curriculum, tailored for students with no prior programming experience.
- Fostered an engaging and collaborative learning atmosphere by utilizing GitHub Classroom, Jupyter Lab, and the univeristy’s LMS.
- Facilitated student comprehension by providing real-world examples with publicly available data.
- Conducted in-class lectures, live coding sessions, and hands-on programming exercises to facilitate student learning
- Provided personalized feedback and support to students to enhance their comprehension and performance
- Developed and administered quizzes and assignments to evaluate student progress and adjust teaching strategies
Quantitative political methodology II
Taught by Jacob Montgomery (advisor). Primary materials: Linear Models with R (Julian Faraway). Required.
This is a second course in political methodology which covered advanced methods of statistical analysis for political and other social scientists. Covered topics like maximum likelihood estimation for various cross-sectional, time series, and non-parametric bootstrapping.
Computational social science
Taught by Christopher Lucas.
Primary materials: Pattern Recognition and Machine Learning (Christopher Bishop); A Course in Machine Learning (Hal Daumé); The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Jerome Friedman, Trevor Hastie, Robert Tibshirani).
This coursed focused on exposing us to different types of data; including networks; text; audio; images; and videos. We began with a mechanistic approaches to supervised and unsupervised learning, then moved to statistical inference with probabilistic interpretations.
Maximum likelihood estimation
Taught by Julia Park.
This course focused on generalized linear model estimation. We had practical exposure to link functions for a number of models including multinomial and unordered models; ordered outcome models; duration models; count models etc.
Taught by Julia Park.
This coursed aimed to introduce theoretical frameworks for causality and the empirical tools used in the estimation of causal effects. This class had us learn and apply skills related to outcomes; causal graphs; experiments; matching; regression; difference-in-differences; instrumental variables; sensitivity analysis; regression discontinuity etc.
Applied statistical programming
Taught by Jacob Montgomery (advisor). Primary materials: R for dummies (de Vries and Meys); Advanced R (Hadley Wickham).
This course aimed to build our skill in programming (R) and expose us to the underpinnings of object-oriented programming. Focused on teaching foundational meta-skills from computer science and statistics such as structures; control/flow; functions; version control/documentation; classes and methods; apply/parallel; debugging and creating packages.
Theories of Individual and Collective Choice
Taught by Keith Schnakenberg. Primary materials: Game Theory: An Introduction (Steven Tadelis). Required.
This class was in an introduction to rational choice theory and exposed us to spatial theory of electoral competition; cooperative game theory; and general equilibrium theory.
Quantitative political methodology I
Taught by Guillermo Rosas. Primary materials: Linear Models with R (Julian Faraway). Required.
This course explored the fundamentals of linear regression models in both scalar and matrix form. Problem sets focused on estimation; inference; specification; diagnostic tools; data management; statistical computation.
Taught by Randy Calvert. Primary materials: Mathematics for Economists (Pemberton and Rau). Required.
This course covered single-variable calculus and portions of multi-variate calculus, linear algebra, and probability theory exposing us to topics including sets and relations; probability; differential calculus and optimization; difference equations; and linear algebra.
Taught by Matt Gabel. Primary materials: Political Science and the Logic of Representations (Kevin A. Clarke, David M. Primo); The logic of real arguments (Alec Fisher). Required.
This course focused on the philosophy of science and its implications for and applications in political science research. The course had three parts: examining the nature of scientific knowledge and scientific progress; considering how scientific principles of defining, evaluating, and developing knowledge can be applied to understanding political phenomena; identifying standards for evaluation and making good social scientific arguments and explanations.
The purpose of this project was to explore the ways discourse varies dependent upon the news outlet reporting. To do so, we collected Tweets shared by news organizations and their replies dating back to 2017. The data collected yielded (soon to be published) insights into variations such as sentiment and similarity. It also invited the use of techniques such as topic modeling and interrupted times series analyses surrounding the events of January 6, 2022. We sought to understand patterns related to the emotional charge and sentiment of text and how this varied across different news outlets and topics.
My major contribution to this project was the development of a reliable ETL pipeline that enabled non-methods researchers to easily load and explore the data. To do so, I first created a a Python script to parse and normalize the data. This involved pagination to fetch large volumes of data and error handling to handle API rate limits and other potential issues. The data retrieved included tweet-level information, media-level information, and context annotations. Logging was implemented to track the progress of the script and help diagnose issues that may arise during the data collection process. The main function was to allow multiple users to divide the work and run the script in parallel without repetition.
I then developed a program that allowed users to automatically extract useful metrics from the data, store the cleansed data in a SQL database and prevent duplicate data processing. This program included preprocessing the text data with methods such as removing unnecessary characters, renaming columns, extracting links, tokenization, lemmatization and calculating various text-based metrics like average word length and total word count. Other features include an unsupervised learning algorithm for obtaining vector representations of words and computing the subjectivity, polarity, and sentiment analysis scores for each tweet
In this project, we investigated social media users’ perceptions of digital political ads. We measured users’ opinions on how platforms should design political ad UX and policies with the goal of establishing a baseline understanding of user opinions’ including the permissibly of political ads and microtargeting, transparency in funding.
The primary objective of this research was to understand what factors of ads (and users themselves) may contribute to their perceptions of how `political’ given digital ads are. To do this, we conducted a conjoint experiment asking respondents to compare artificial Facebook ads where we altered their source, content, and political orientation. This conjoint design allowed us to isolate the independent effects of each component on perceptions of the political.
We also conducted a within-between experiment asking respondents to evaluate real ads drawn from the Facebook Ad Library (collected by co-author). In this portion of the project, we randomly assigned respondents’ to view either a political or non-political advertisement and asked to rate how political they perceived it to be. Respondents rated multiple ads (within-subject variation) but the exact composition of the ads was randomized for each respondent (between-subject variation)
Overall, our conjoint analysis strongly supported our original research hypotheses showing that the source, strength, and orientation of the message all matter. We found that candidate ads seem to be viewed as inherently political, in contrast to sources such as politically active companies and advocacy organizations, where message strength appears to matter far more in order for an ad to be considered political. This differs from our finding in the conjoint analysis, where ads from companies and advocacy organizations were viewed as equally political.
I was brought into this project after the research design and implementation stages of the surveys had taken place and tasked with the responsibility of maintaining and overseeing the data for a project. I quickly acquired a working understanding of the mathematical principles and methodologies behind conjoint experiments, a less-common analytical approach in my field. Upon examination of the data and methods, I identified discrepancies in the expected number of profiles, subsequently informing my collaborators of the error which had compromised the random assignment. Consequently, the survey distributor rectified the parameters and redistributed the survey, ensuring the project’s successful progression.
I implemented analyses in this project in R, using libraries such as dplyr, magrittr, and tidyverse to analyze political ad data and examine the impact of ad orientation on political preferences. I developed an R script to clean and process the data in order to create relevant variables and handle missingness. I implemented advanced data manipulation techniques and reshaped the datasets to make them more manageable for further analysis.
I then conducted a comprehensive analysis on political advertisement data, encompassing four novel datasets. Utilizing weighted confidence intervals and an array of statistical techniques, I visualized the findings through point-range plots, effectively conveying the political nature of the ad content. Additionally, I carried out a follow-up study to further investigate the perceived political content of various advertisements, expanding the project’s scope and providing a more in-depth understanding of the relationship between ad content and political affiliation.
Across the two experiments, we found no evidence supporting our hypotheses. While Study 1 showed that corrective comments in the comments section effectively reduced misperceptions, culturally relevant corrections were not particularly effective among Latinos. In Study 2, we did not find evidence to support the hypothesis that culturally relevant comments are more effective in reducing misperceptions. Interestingly, the in-group corrections from those who were the target of misinformation were most effective among all participants in both experiments, which may suggest that members of out-groups defer to other ethnic groups when the misinformation does not relate to them.
This project aimed to investigate the effectiveness of corrective comments on social media in reducing misinformation, specifically examining the role of in-group and out-group members in promoting accurate beliefs. We hypothesized that comments from in-group members would be more effective in promoting accurate beliefs than comments from out-group members, particularly when the misinformation relates to the target’s ethnic group. We also hypothesized that corrective comments from culturally relevant organizations would be more effective in reducing misperceptions.
To test these hypotheses, we conducted three experiments. Participants were presented with fake Facebook posts containing misinformation targeting specific ethnic groups and were sorted into different conditions, including no misinformation post and no comments, misinformation post and no replies, misinformation post with in-group correction, and misinformation post with out-group correction. Before and after the treatment (misinformation post), participants were asked their belief in specific misinformation. The experiments relied on two survey waves that oversampled black (both surveys) and Latino (first survey only) respondents. Both experiments were sponsored by the Weidenbaum Center at Washington University in Saint Louis and distributed by NORC.
In this project, I used the programming language R and used several packages (readr, dplyr, magrittr, knitr, ggplot2, labelled, haven, tidyverse, OneR, texreg, and weights) to perform data analysis. To analyze the data, I used techniques such as weighted means, confidence intervals, and linear regression. After manipulating the data to create new variables, I used regression models to examine the impact of treatments (or lack thereof) on different racial and ethnic groups’ susceptibility to misinformation. I also created tables that displayed estimates, standard errors, and confidence intervals for different treatments and racial/ethnic groups.
This data science project investigated the influence of motivated reasoning on individuals’ evaluation of logical arguments, addressing three key questions:
Utilizing R and I designed and conducted two large-n survey experiments, finding that individuals can distinguish between strong and weak arguments, but exhibit a bias favoring statements aligned with their preferences. This bias persisted across strong and weak arguments, political and non-political topics, and multiple issue areas.
The project also evaluated the effectiveness of priming objectivity goals in reducing biases in argument evaluation. The first study suggested potential improvements in weak argument evaluation accuracy, while the second study showed no measurable effect.
This research revealed the pervasiveness of argument congruency bias and demonstrated that individuals’ biases influence, but do not entirely overwhelm, their ability to accurately rate argument quality. By exploring the potential of priming objectivity as an intervention, this project contributed valuable insights into argument evaluation and strategies for reducing.