Tutorials
Read all latest tutorial posts
WebSweep: Collecting Website Text for Research
WebSweep helps researchers capture what was publicly visible on a given date, preserve the raw HTML as a reproducible archive, and turn those pages into analysis-ready text.
Use WebSweep when you:
- have a list of public websites or domains
- want a repeatable workflow for many domains
- mainly need HTML text and metadata from public pages
In this tutorial, we use the example of FIRMBACKBONE. It is the Dutch research infrastructure to provides secure, FAIR access to comprehensive data on all registered organizations in the Netherlands, including web-based data. We would like to collect information of corporate websites, for example to track the scope and depth of coverage of the energy transition. The same workflow can be used for universities, NGOs, government organisations, local news sites, project websites, or any other public set of domains.
read more
How to share your research code
What are the best ways to create an understandable, openly accessible, findable, citable, and stable archive of your code?
read more