How to create synthetic data

In September 2022, we hosted a workshop about synthetic data.

About the workshop

Open data is one of the pillars of open science. However, there are often barriers in the way of making research data openly available, relating to consent, privacy, or organisational boundaries. In such cases, synthetic data is an excellent solution: the real data is kept secret, but a “fake” version of the data is available. The promise of the synthetic dataset is that others can then investigate the data structure, rerun scripts, use the data in educational materials, or even run a completely different analysis on their own.

But how do you generate synthetic data? In this session, we will introduce the field of synthetic data generation and apply several tools to generate synthetic versions of datasets, with various level of utility and privacy. We will be paying extra attention to practical issues such as missing values, data types, and disclosure control. Participants can either use a provided example dataset or they can bring their own data!

Additional information
When
September 2022.
Where
Open Science Festival.
Registration
Registration is no longer possible.
Instructors
Erik-Jan van Kesteren, Assistant Professor at Utrecht University (Erik-Jan’s website), Raoul Schram, Research Engineer at Utrecht University (Raoul’s website) & Thom Volker, PhD Candidate at Utrecht Universit (Thom’s website).
Materials
Course materials, including slides and code, are open and can be accessed here.