Tutorials
Read all latest tutorial posts
WebSweep: Collecting Website Text for Research
WebSweep helps researchers capture what was publicly visible on a given date, preserve the raw HTML as a reproducible archive, and turn those pages into analysis-ready text.
Use WebSweep when you:
- have a list of public websites or domains
- want a repeatable workflow for many domains
- mainly need HTML text and metadata from public pages
In this tutorial, we use the example of FIRMBACKBONE. It is the Dutch research infrastructure to provides secure, FAIR access to comprehensive data on all registered organizations in the Netherlands, including web-based data. We would like to collect information of corporate websites, for example to track the scope and depth of coverage of the energy transition. The same workflow can be used for universities, NGOs, government organisations, local news sites, project websites, or any other public set of domains.
read more
How to manage your IP address in python
If you have ever tried web scraping, you may have run into your IP address being blocked by the website you scrape.
read more
Training a fastText model from scratch using Python
In this tutorial, we explain how to train a natural language processing model using fastText
read more
Visualizing international flows with Geoflow visualizer
A new free and open-source tool designed to visualize international flows in an interactive way.
read more
Wrangling interval data using lubridate
Using time interval objects to robustly extract data from transactions.
read more
Collecting online platforms data for science: an example using WhatsApp
In this tutorial, we use data donation and the Port software to get access to WhatsApp group-chat data in a way that completely preserves privacy of research participants.
read more
ArtScraper: A Python library to scrape online artworks
ArtScraper is a Python library to download images and metadata for artworks available on WikiArt and Google Arts & Culture.
read more