Projects
Here you can find descriptions and links to projects we worked on in the past.
Collaborations
Policy intervention assessment in primary schools
We are helping to compute individual causal effects of a policy intervention in primary schools in Rotterdam. For this, we are exploring the use of advanced quasi-experimental methods such as synthetic controls.
Contributors: Gijs Custers, Erik-Jan van Kesteren, Oisín Ryan
Machine learning pipeline for early life opportunity
We are helping with the creation of a data analysis and machine learning pipeline for the project "Kansrijke start", which aims to investigate which markers in the first 1000 days since conception are predictive of adverse events later in life.
Contributors: Wessel Kraaij, Erik-Jan van Kesteren, Anton Schreuder, Richard van Dijk
Computational models for word and non-word associations
We are collaborating on a project for computationally modelling people's intuitions about various associations of both real words and non-words (e.g., novel words or company names) for the Dutch language. The project will result in an easy-to-use openly available application in which (non-)words can be analyzed for various associations that they may evoke as well as give a list of semantically similar words.
Contributors: Giovanni Cassani, Erik-Jan van Kesteren, Aron Joosse
Trust in public institutions
We are creating a pipeline and dashboard to evaluate how trust on European public institutions evolves during pandemic.
Contributors: Patrick Brown, Javier Garcia-Bernardo, Matthijs Vollenbroek, Stijn Peeters, Marc Tuters
COVID-19 spread in social networks
Together with RIVM and the ministry of health, the SoDa team is co-authoring a scientific paper on the spread of COVID-19 in schools in the Netherlands.
Contributors: RIVM, Javier Garcia-Bernardo
Linking datasets based on company names
Linking databases on company names is a challenging task. Company names are usually not unique and can have many spelling variations. We helped conduct a sensitivity analysis for different methods of linking these databases, which can be used to answer many different social science research questions about companies in the Netherlands.
Contributors: Peter Gerbrands, Jonathan de Bruin, Wim Coreynen
Synthetic register data for open science
We created a workflow using existing software packages to generate synthetic datasets for the Statistics Netherlands microdata architecture. These synthetic datasets can then be used as example datasets when sharing analyses (but not original data!) with researchers.
Contributors: Jan Kabatek, Erik-Jan van Kesteren, Kyuri Park
Deviance in art automatic data collection
We created an automated data collection program (a "scraper") to make a database of artworks with varying levels of deviance. This database will be used in research on how to measure deviance in art.
Contributors: Eftychia Stamkou, Javier Garcia-Bernardo, Raoul Schram
Anonymity preserving collection of whatsapp data
We are creating a script to extract information from Whatsapp data packages, allowing to link data from different people while preserving the privacy of those people. This project is part of the ODISSEI LISS grant “Assessing Mobile Instant Messenger Networks with Donated Data”
Contributors: Laura Boeschoten, Javier Garcia-Bernardo, Parisa Zahedi, Shiva Nadi, Rense Corten
Population network data processing
We have helped with the implementation of the Statistics Netherlands population network data files in order to make them available to network researchers. These network data files can be used to develop network analysis models.
Contributors: Tom Emery, Javier Garcia-Bernardo
Inference from volunteer data
We created an analysis pipeline as part of a paper which outlines how to perform precise statistical inference (correcting for geospatial selection bias) using volunteer-generated data.
Contributors: Peter Lugtig, Erik-Jan van Kesteren, Annemarie Timmers, Javier Garcia-Bernardo
Housing market data engineering
We performed data engineering work to transform 10TB of online marketing (clicks) data from a large online housing platform into an analyzable format. These datasets are used in research surrounding search behaviour on the housing market in the Netherlands. We made the processed data available as an open dataset.
Contributors: Joep Steegmans, Jonathan de Bruin
Citizen science website
To get an overview of what citizen science projects are available in the Netherlands, we have created a website with an overview of such projects. The community can contribute their own projects via the gitub page!
Contributors: Peter Lugtig, Jonathan de Bruin, Leonardo Vida, Annemarie Timmers
Geoenrichment
We have created an R-package to perform geo-enrichment of datasets using openstreetmaps. Enriching geo-coded (latitude/longitude) data sets with features from the physical surroundings enables researchers to take into account spatial surroundings in statistical models.
Contributors: Peter Lugtig, Erik-Jan van Kesteren, Leonardo Vida
Geoenrichment docker images
Geo-enrichment requires transferring large amounts of data from a geospatial database to a computer program. Public APIs served over the internet are usually too slow for this purpose. Hence, we have created a docker image so that the API for our osmenrich R package can be run locally.
Contributors: Peter Lugtig, Erik-Jan van Kesteren, Leonardo Vida
Kansenkaart analysis pipeline
Using large data sets from Statistics Netherlands, we developed a data pre-processing and analysis pipeline for estimating expectations concerning the inequality of opportunity in The Netherlands using the ODISSEI Secure Supercomputer (OSSC). These estimates will be available on the project website.
Contributors: Bastian Ravesteijn, Erik-Jan van Kesteren, Helen Lam
Consultations
Implementing spatial data analysis for sociological research
We brainstormed about options and packages in R to incorporate spatial data in a sociological research project.
Contributors: Kevin Wittenberg, Erik-Jan van Kesteren, Javier Garcia-Bernardo
Initializing transport ABMs with a synthetic population
We thought about which open and closed data sources and which methods would be best to create a synthetic population for initializing an agent-based model of transport behaviour. We also discussed how to validate how good the synthetic population was.
Contributors: Marco Pellegrino, Erik-Jan van Kesteren
Network analysis of symptoms
Discussing different methods of preprocessing a medical dataset for subsequent network analysis to create symptom networks.
Contributors: Willemijn van Waarden, Erik-Jan van Kesteren, Javier Garcia-Bernardo
Software sustainability for Rsiena
We are helping to improve the sustainability of the RSiena network analysis software package, by helping to write a grant proposal and through a brainstorm session on efficient collaboration on GitHub.
Contributors: Tom Snijders, Erik-Jan van Kesteren, Christian Steglich, Javier Garcia-Bernardo, Jonathan de Bruin
Online Housing Market search strategy
Which search behaviour leads to finding a house quickly on the housing market? We brainstormed about how to perform analysis for a study on this topic using a large database of online housing search behaviour.
Contributors: Joep Steegmans, Erik-Jan van Kesteren
Firmbackbone
We regularly consult on FIRMBACKBONE, an initiative to collect an organically growing longitudinal data-infrastructure with information on Dutch companies for academic research. This data will become available for researchers affiliated with universities in The Netherlands through ODISSEI. We are consulting on the technical implementation of the FIRMBACKBONE project.
Contributors: Peter Gerbrands, Javier Garcia-Bernardo, Erik-Jan van Kesteren, Jonathan de Bruin
Synthetic data for agent-based models
Brainstorming with researchers about working with Statistics Netherlands data and generating synthetic data that can serve as input in an agent-based model.
Contributors: Sanne Hettinga, Erik-Jan van Kesteren, Corentin Kuster