Recent Projects
DSI Persuasion Chatbot
As a part of the Data Science Institute’s Data and Democracy initiative, we built a retrieval-based chatbot on Facebook messenger for automated persuasion on key societal issues. As an example intervention, we are canvassing on the topic of anti-transgender prejudice. I built the NLP pipeline for the project - collecting data from humans (on Amazon MTurk) and generative models (text-davinci-003), cleaning and organizing the data, and training a RoBERTA-based intent and profanity classifier. If you’re interested in engaging with our chatbot, you can find it at https://m.me/ResearchUofC.
Estimating Political Ideology using (massive) Social Network Data
I estimate the ideology for all the members of the 116th Congress and their Twitter followers with a Bayesian Latent Space model, using information about the legislators’ Twitter followers network. This method correctly identifies legislators in the extreme right and left of the political spectrum, which conventional estimates like DW-NOMINATE fail to identify. I also provide a detailed analysis of the standard errors for ideology estimates of the legislators and ordinary users.
Quantifying U.S. presidents' uniqueness using multimodal analysis
Aimed to answer how presidents present themselves differently in different contexts by implementing a multi-modal deep learning pipeline, extracting information from text, audio, and image data from presidential speeches. Built deep learning models for encoding speech information (fasttext and BERT), extracting features from audio (CNN audio classifier, CNN emotion recognition), extracting unique features from images (EfficientNet, CNN emotion recognition), and for multimodal prediction (self-defined RankNet).
Studying the effect of content moderation on language patterns
Pursued an independent project to study the impact of mass content moderation on Reddit on the language patterns of users. Analysed manually scraped data collected for seven subreddits over 18 months and found discernible changes in syntactic, semantic and topic discussions in different subreddits after a mass content moderation event.
Internet censorship in Turkmenistan
Studied the extent of online censorship and techniques used in Turkmenistan. Our team gathered over 600 million website URLs to test censorship and identified all relevant Autonomous Systems and DNS resolvers in the country. I conducted a thematic analysis of blocked websites using regular expression matching to manipulate DNS requests.