Tutorials

Read all latest tutorial posts

March 24, 2026

WebSweep: Collecting Website Text for Research

WebSweep helps researchers capture what was publicly visible on a given date, preserve the raw HTML as a reproducible archive, and turn those pages into analysis-ready text.

Use WebSweep when you:

have a list of public websites or domains
want a repeatable workflow for many domains
mainly need HTML text and metadata from public pages

In this tutorial, we use the example of FIRMBACKBONE. It is the Dutch research infrastructure to provides secure, FAIR access to comprehensive data on all registered organizations in the Netherlands, including web-based data. We would like to collect information of corporate websites, for example to track the scope and depth of coverage of the energy transition. The same workflow can be used for universities, NGOs, government organisations, local news sites, project websites, or any other public set of domains.

September 5, 2022

How to share your research code

What are the best ways to create an understandable, openly accessible, findable, citable, and stable archive of your code?