7th Annual Likert Symposium - Generating and Classifying Text: Challenges and Benefits of Using Language Models in Social Research
From Elisabeth Schneider
views
comments
Related Media
7th Annual Likert Symposium
Generating and Classifying Text: Challenges and Benefits of Using Language Models in Social Research
March 14, 2025
Dr. Trent D. Buskirk
LLMs are Large, but should they be in Charge? Exploring the Possibility and Implausibility of Large Language Models in Survey Science
Take a glance through recent news or social media and you would be hard-pressed not to see mentions of chatbots and artificial intelligence (AI) methods aimed at generating text, images and other content. Recent work by Eloundou and colleagues (2023) explored the potential impact of these types of generative AI on labor markets, and survey research was among the top two most impacted industries. In light of this finding, naturally, we wonder: how will this technology change the way survey researchers work? How might chatbot technologies or related software be leveraged to support, enhance, or expand our work by assisting with developing and testing of survey research products such as questions, questionnaires, and study/sampling designs. In this talk we will describe how LLMs work and showcase some of the current ways that these models are being used within the survey research process chronicling their applications in the design, collection, and analyses of survey data. We will also discuss some limitations of this technology as it relates to applications within the survey research process. And we note that no part of this abstract was generated using a chatbot.
Trent D. Buskirk, Ph.D. has recently joined the new School of Data Science at Old Dominion University as one of several founding faculty members. Prior to this appointment, Trent was the Novak Family Distinguished Professor of Data Science and outgoing Chair of the Applied Statistics and Operations Research Department at Bowling Green State University. Dr. Buskirk is a Fellow of the American Statistical Association and his research interests include big data quality, recruitment methods through social media, the use of big data and data science methods for health, social and survey science design and analysis, and in data centric AI and fairness and explainability in AI models. Trent has also been involved in various professional organizations serving as the President of the Midwest Association for Public Opinion Research in 2016, the Conference Chair for AAPOR in 2018 and a member of the scientific committee for the BigSurv series of conferences since 2018. Trent as also served as an Associate Editor (Methods) for the Journal of Survey Statistics and Methodology. When Trent is not geeking out over data science or survey research, he’s likely out playing a competitive game of Pickleball!
Haomiao Jin
Exploring an AI-Powered Survey Interviewing Agent for Individuals Who Are Blind or Severely Visually Impaired
Individuals who are blind or severely visually impaired often encounter significant barriers when completing online surveys. While assistive technologies exist, they may not provide the support needed for a smooth survey experience. AI-powered interviewing agents offer a promising solution, but their design must be informed by the real-world needs and preferences of target users. This ongoing study adopts a co-design approach to develop an AI-powered interviewing agent tailored for individuals who are blind or severely visually impaired. A diverse group of participants (N=20–30) is being recruited through online advertisements, support groups, charitable organizations, and snowball sampling. Participants share their experiences with digital devices and survey-taking, identify key accessibility challenges, and outline essential features that would enhance the usability and engagement of an AI-powered agent. Preliminary findings indicate that participants prioritize interoperability with familiar assistive software and prefer conversational agents that offer human-like, flexible interactions rather than rigid constraints such as time or word limits. Emerging insights suggest that designing an effective survey interviewing agent for this population requires balancing accessibility and data reliability. These findings will inform the development of AI-powered tools that can improve survey accessibility while maintaining the quality of collected data.
Haomiao Jin is an Assistant Professor in Health Data Sciences at the University of Surrey, UK. Before moving to the UK, he was a Research Scientist at the Center for Economic and Social Research at the University of Southern California. His research sits at the intersection of data science and survey methodology, with a particular focus on collecting and modeling self-reported health and well-being data. His work has contributed to a range of survey-based studies, from ecological momentary assessment research to large-scale population studies, advancing methods for capturing and analyzing self-reported data in health research contexts.
Joelle Abramowitz
Using Artificial Intelligence to Improve the Panel Study of Income Dynamics and Health and Retirement Study
AI methods have the potential to improve practices throughout the survey process. While applying these approaches could have great benefit, AI will not be the solution for every problem, and accordingly, assessing the feasibility of different approaches and identifying worthwhile applications and best practices is valuable. In this talk, I will present some efforts using the Panel Study of Income Dynamics and Health and Retirement Study to begin to implement these methods to improve the data production process. These include an approach we have implemented using existing open-ended narrative data to produce new variables of interest. I will also present several efforts in progress related to automating existing manual processes for classifying open-ended narrative industry and occupation data, better understanding how the data are being used, and improving documentation.
Dr. Joelle Abramowitz is an Associate Research Scientist at the University of Michigan's Institute for Social Research and Co-Director of the Michigan Federal Statistical Research Data Center. She completed her Ph.D. in Economics in 2013 at the University of Washington. Her research examines the effects of different policies and environmental factors on individuals’ major life decisions and wellbeing, including on self-employment, health insurance, and medical out-of-pocket expenditures as well as bigger picture effects on outcomes such as marriage, fertility, work, and mortality.
AI methods have the potential to improve practices throughout the survey process. While applying these approaches could have great benefit, AI will not be the solution for every problem, and accordingly, assessing the feasibility of different approaches and identifying worthwhile applications and best practices is valuable. In this talk, I will present some efforts using the Panel Study of Income Dynamics and Health and Retirement Study to begin to implement these methods to improve the data production process. These include an approach we have implemented using existing open-ended narrative data to produce new variables of interest. I will also present several efforts in progress related to automating existing manual processes for classifying open-ended narrative industry and occupation data, better understanding how the data are being used, and improving documentation.
James Bisbee
What To Do When Your Language Model is Not State of the Art: When (not) to Worry About Misclassification and How to Correct for It in Social Science Applications
The rapid progress in the field of large language models (LLMs) has raised concerns about reproducibility of research, especially as each subsequent generation of these models often dramatically outperforms the previous one. In this paper, we put structure on this problem by demonstrating that 1) failing to use ``state of the art'' models does not actually matter much for typical categorization tasks when LLM-generated predictors and treatments are used in downstream regression analyses, and 2) existing de-biasing solutions can further reduce these concerns. We support these claims via a combination of simulations and replication materials, and we provide an R package to apply our proposed method. Our summary conclusion is that, for researchers, refusing to use 'open' LLMs because they are not "state of the art'' is likely wrong once their advantages are properly considered.
Dr. Bisbee is a political scientist who investigates the influences of various forms of information on public opinions and behaviors, utilizing a wide range of empirical evidence. This includes studying the effects of local unemployment rates and elite cues on societal views and actions. Currently, he serves as an Assistant Professor of Political Science at Vanderbilt University and holds an affiliation with the Data Science Institute. Prior to this, Dr. Bisbee was engaged as a postdoctoral researcher at New York University's Center for Social Media and Politics, and as a postdoctoral fellow at Princeton University's Niehaus Center for Globalization and Governance. His scholarly contributions have been featured in several esteemed peer-reviewed journals, such as the American Political Science Review, the American Journal of Political Science, the Journal of Politics, the Journal of Labor Economics, Political Analysis, and International Organization, among others.
- Tags
-