I'm a data scientist and PhD Candidate at the University of Pennsylvania. In my research, I utilize machine learning, econometrics, and experimentation to study human behavior and the media. In addition to my PhD, I’m recently received a Masters in Data Science and Statistics from The Wharton School.
I am currently a PhD Data Science Intern at Theta Equity Partners. Last summer, I was a Machine Learning Intern at DataCamp, where I used various large language models (LLMs) to build an internal linking recommendation tool. I now hope to continue to apply these skills and gain additional experience in other data-driven roles.Contact Download My Resume
Generative large language models (LLMs) can be powerful tools for augmenting text annotation procedures. Using GPT-4, we replicated 27 annotation tasks from articles in high-impact journals and show that LLM performance is promising but contingent on the task and dataset. As a result, we argue that any automated annotation process using an LLM must validate the LLM’s performance against labels generated by humans. To ensure effective use of LLMs for annotation, we’re releasing easy-to-use Python code designed to streamline LLM deployment and validation procedures.Link to Full Project and Open-Source Python Package
Using governmental administrative data and socio-demographic data, I show that LASSO logistic regression and random forest are effective at predicting individual-level donation behavior. LASSO logistic regression correctly classifies 82.7% of test cases (61.9% of positive cases) and random forest correctly classifies 92.8% of test cases (99.9% of positive classes). Although both of these accuracy scores are notably higher than the 74.1 percent no-information rate, random forest proves to be the suprior model by far.Link to Full Project
We scraped a novel dataset of approximately 18,000 closed captioning transcripts from local television news programs across three cities from 2014 to 2018 and created a series of RoBERTa classifiers to identify news topics in these transcripts over time. Across the seven selected topics, the RoBERTa models achieved an average precision score of 0.85, an average recall score of 0.876, and an average F1 score of 0.859.View findings through interactive dashboard
Protests in the United States have become a common method for citizens to express their concerns about various social and political issues. This study examines the causal impact of protests on individuals' willingness to donate money to American political campaigns. We find a substantial causal relationship between protests and political donations using a staggered difference-in-differences (DiD) design with county and temporal fixed effects. The results indicate that a one-percent increase in protest activities within a county leads to a 0.76% increase in the number of donations and a 1.13% increase in the total donation amount in the immediate days following the protest.Link to Full Project
Using two nationally representative survey experiments, I test the hypothesis that—when the threats and risks migrants face in their home country are equalized—natives will not penalize people who immigrate due to economic threat relative to people immigrating due to violence threat. After accounting for threat in the migrant’s home country, I find that natives’ special penalty associated with economic types of migration is erased. My results indicate that policymakers should reconsider existing immigration and refugee policies related to economic migration and poverty.Link to Full Project
3+ years of experience as a quantitative researcher and data scientist.Linkedin Download My Resume