Tutorials

Read all latest tutorial posts

WebSweep: Collecting Website Text for Research

March 24, 2026

WebSweep: Collecting Website Text for Research

WebSweep helps researchers capture what was publicly visible on a given date, preserve the raw HTML as a reproducible archive, and turn those pages into analysis-ready text.

Use WebSweep when you:

have a list of public websites or domains
want a repeatable workflow for many domains
mainly need HTML text and metadata from public pages

In this tutorial, we use the example of FIRMBACKBONE. It is the Dutch research infrastructure to provides secure, FAIR access to comprehensive data on all registered organizations in the Netherlands, including web-based data. We would like to collect information of corporate websites, for example to track the scope and depth of coverage of the energy transition. The same workflow can be used for universities, NGOs, government organisations, local news sites, project websites, or any other public set of domains.

read more

How to manage your IP address in python

February 27, 2024

How to manage your IP address in python

If you have ever tried web scraping, you may have run into your IP address being blocked by the website you scrape.

read more

January 22, 2024

Training a fastText model from scratch using Python

In this tutorial, we explain how to train a natural language processing model using fastText

read more

Visualizing international flows with Geoflow visualizer

December 11, 2023

Visualizing international flows with Geoflow visualizer

A new free and open-source tool designed to visualize international flows in an interactive way.

read more

Wrangling interval data using lubridate

September 29, 2023

Wrangling interval data using lubridate

Using time interval objects to robustly extract data from transactions.

read more

Collecting online platforms data for science: an example using WhatsApp

September 8, 2023

Collecting online platforms data for science: an example using WhatsApp

In this tutorial, we use data donation and the Port software to get access to WhatsApp group-chat data in a way that completely preserves privacy of research participants.

read more

ArtScraper: A Python library to scrape online artworks

October 4, 2022

ArtScraper: A Python library to scrape online artworks

ArtScraper is a Python library to download images and metadata for artworks available on WikiArt and Google Arts & Culture.

read more