In September 2025, Qixiang Fang gave a keynote speech on The 5th Workshop on Computational Linguistics for the Political and Social Sciences (CPSS) in Germany.
About the workshop
In his keynote, Fang emphasized the critical importance of validity—ensuring that measurements truly capture the concepts they claim to—in interdisciplinary NLP research. He compared how validity is defined across social sciences, educational testing, and physics, and highlighted frequent pitfalls when NLP studies neglect these traditions, such as in personality prediction and LLM benchmarking.
Fang demonstrated how under-specified benchmarks and selective human comparisons can lead to misleading conclusions and showcased item-response-theory methods for more rigorous evaluation of LLM capabilities. He further warned that high model accuracy does not guarantee valid downstream inferences when using LLM-generated annotations. He closed by urging the field to standardize language around validity, rethink cognitive assessments of NLP systems, and explicitly account for uncertainty in LLM-based measurements.
Additional information |
|
---|---|
When |
September 2025. |
Where |
The 5th Workshop on Computational Linguistics for the Political and Social Sciences (CPSS), Germany. |
Registration |
Registration is no longer possible. |
Instructors |
Qixiang Fang, Postdoctoral Researcher at Utrecht University. |
Materials |
Course materials, including slides and code, are open and can be accessed here. |