Here you can find descriptions and links to projects we worked on in the past.


Synthetic register data for open science

We are creating a script using existing software packages to generate synthetic datasets for the Statistics Netherlands microdata architecture. These synthetic datasets can then be used as example datasets when sharing analyses (but not original data!) with researchers.
Contributors: Jan Kabatek, Erik-Jan van Kesteren, Kyuri Park

Benchmarking for social science

We helped to design and set up a benchmark for a social data science challenge at the end of the SICSS-ODISSEI summerschool. The benchmark was based on microdata from Statistics Netherlands.
Contributors: Paulina Pankowska, Javier Garcia-Bernardo, Adrienne Mendrik

Metadata for synthetization

We are developing a metadata format which includes variable-level statistical information. This format can then be used to generate fake, synthetic datasets for testing purposes using a python package.
Contributors: Ricarda Braukmann, Erik-Jan van Kesteren, Raoul Schram

Deviance in art automatic data collection

We created an automated data collection program (a "scraper") to make a database of artworks with varying levels of deviance. This database will be used in research on how to measure deviance in art.
Contributors: Eftychia Stamkou, Javier Garcia-Bernardo, Raoul Schram

Anonymity preserving collection of whatsapp data

We are creating a script to extract information from Whatsapp data packages, allowing to link data from different people while preserving the privacy of those people. This project is part of the ODISSEI LISS grant “Assessing Mobile Instant Messenger Networks with Donated Data”
Contributors: Laura Boeschoten, Javier Garcia-Bernardo, Parisa Zahedi, Shiva Nadi, Rense Corten

Supercomputing for social scientists

We co-created and co-taught a full-day workshop on high-performance computing for social scientists.
Contributors: Carlos Teijeiro Barjas, Erik-Jan van Kesteren, Benjamin Czaja

Population network data processing

We have helped with the implementation of the Statistics Netherlands population network data files in order to make them available to network researchers. These network data files can be used to develop network analysis models.
Contributors: Tom Emery, Javier Garcia-Bernardo

Empathy diagnostics dashboard

We created a pilot for an interactive questionnaire app which immediately generates a diagnostic report based on the inputs. This app is now used to study empathy in anti-social adolescents.
Contributors: Minet de Wied, Javier Garcia-Bernardo, Shiva Nadi, Parisa Zahedi

Inference from volunteer data

We created an analysis pipeline as part of a paper which outlines how to perform precise statistical inference (correcting for geospatial selection bias) using volunteer-generated data.
Contributors: Peter Lugtig, Erik-Jan van Kesteren, Annemarie Timmers, Javier Garcia-Bernardo

Housing market data engineering

We performed data engineering work to transform 10TB of online marketing (clicks) data from a large online housing platform into an analyzable format. These datasets are used in research surrounding search behaviour on the housing market in the Netherlands. We made the processed data available as an open dataset.
Contributors: Joep Steegmans, Jonathan de Bruin

Citizen science website

To get an overview of what citizen science projects are available in the Netherlands, we have created a website with an overview of such projects. The community can contribute their own projects via the gitub page!
Contributors: Peter Lugtig, Jonathan de Bruin, Leonardo Vida, Annemarie Timmers


We have created an R-package to perform geo-enrichment of datasets using openstreetmaps. Enriching geo-coded (latitude/longitude) data sets with features from the physical surroundings enables researchers to take into account spatial surroundings in statistical models.
Contributors: Peter Lugtig, Erik-Jan van Kesteren, Leonardo Vida

Geoenrichment docker images

Geo-enrichment requires transferring large amounts of data from a geospatial database to a computer program. Public APIs served over the internet are usually too slow for this purpose. Hence, we have created a docker image so that the API for our osmenrich R package can be run locally.
Contributors: Peter Lugtig, Erik-Jan van Kesteren, Leonardo Vida

Kansenkaart analysis pipeline

Using large data sets from Statistics Netherlands, we developed a data pre-processing and analysis pipeline for estimating expectations concerning the inequality of opportunity in The Netherlands using the ODISSEI Secure Supercomputer (OSSC). These estimates will be available on the project website.
Contributors: Bastian Ravesteijn, Erik-Jan van Kesteren, Helen Lam


Initializing transport ABMs with a synthetic population

We thought about which open and closed data sources and which methods would be best to create a synthetic population for initializing an agent-based model of transport behaviour. We also discussed how to validate how good the synthetic population was.
Contributors: Marco Pellegrino, Erik-Jan van Kesteren

Network analysis of symptoms

Discussing different methods of preprocessing a medical dataset for subsequent network analysis to create symptom networks.
Contributors: Willemijn van Waarden, Erik-Jan van Kesteren, Javier Garcia-Bernardo

Online Housing Market search strategy

Which search behaviour leads to finding a house quickly on the housing market? We brainstormed about how to perform analysis for a study on this topic using a large database of online housing search behaviour.
Contributors: Joep Steegmans, Erik-Jan van Kesteren


We regularly consult on FIRMBACKBONE, an initiative to collect an organically growing longitudinal data-infrastructure with information on Dutch companies for academic research. This data will become available for researchers affiliated with universities in The Netherlands through ODISSEI. We are consulting on the technical implementation of the FIRMBACKBONE project.
Contributors: Peter Gerbrands, Javier Garcia-Bernardo, Erik-Jan van Kesteren, Jonathan de Bruin

Computational efficiency

We brainstormed about how the analysis for a research project with big-data could be set up and whether it runs on a personal computer.
Contributors: Thijs Lindner, Erik-Jan van Kesteren

Synthetic data for agent-based models

Brainstorming with researchers about working with Statistics Netherlands data and generating synthetic data that can serve as input in an agent-based model.
Contributors: Sanne Hettinga, Erik-Jan van Kesteren, Corentin Kuster

eScience consultations

We regularly join consultations done by the eScience center for projects that fall within the social sciences, for example in preparation for the ODISSEI-eScience grants.
Contributors: Various researchers, Jonathan de Bruin, Erik-Jan van Kesteren, Javier Garcia-Bernardo