LLMs for Data Collection Annotation

About the workshop

Large language models (LLMs) have become an essential tool for social scientists and humanities (SSH) researchers who work with textual data. One particularly valuable use case is automating text annotation, traditionally a time-consuming step in preparing data for empirical analysis. Yet, many SSH researchers face two challenges: getting started with LLMs, and understanding how to evaluate and correct for their limitations. The rapid pace of model development can make LLMs appear inaccessible or intimidating, while even experienced users may overlook how annotation errors can bias results from downstream analyses (e.g., regression estimates, p-values), even when accuracy appears high.

This workshop provides a step-by-step, hands-on guide to using LLMs for text annotation in SSH research for both Python and R users. We cover (1) how to make use of LLMs from SSH perspectives, (2) how to choose and access LLM APIs, (3) how to design prompts and run annotation tasks programmatically, (4) how to evaluate annotation quality and iterate on prompts, (5) how to integrate annotations into statistical workflows while accounting for uncertainty, and (6) how to manage cost, efficiency, and reproducibility. Throughout, we provide concrete examples, code snippets, and best-practice checklists to help researchers confidently and transparently incorporate LLM-based annotation into their workflows.

Lunch and refreshments will be provided.


Additional information
When
Friday, 23rd January 2026, from 09.30h to 15.00h
Where
Utrecht University, Administration building, room “Van Lier & Eggink”.
Registration
Register here. It is free of charge for ODISSEI organization members.
Instructors
Qixiang Fang, Postdoctoral Researcher at ODISSEI SoDa. Qixiang’s GitHub