Solvent
Solvent is a Firefox extension that helps you write screen scrapers for Piggy Bank.
(requires Piggy Bank)
Solvent logo.png

Why do I need screen scrapers?

Piggy Bank needs web pages to embed information in a format that it can understand. This format is called RDF (Resource Description Framework) and its main advantage is that makes machine processing a lot easier. Unfortunately, at these very early stages, not many web pages embed or link to such "purer" RDF information. Piggy Bank, however, is capable of executing a particular screen scraper on particular pages in order to "extract" the information it needs.

In short, screen scrapers allow you to turn a regular web page into a regular web page plus semantic data, and thus frees the data from the page/site that contains it.

How do I use it?

Watch a screencast of Solvent scraping the location of Starbucks coffee shops in Cambridge, MA and then use Piggy Bank to show the scraped data on a map.

Also read the Piggy Bank screen scraping howto that uses Solvent to write a screen scraper for Piggy Bank.

There is another tutorial about using Solvent to scrape web pages containing data about baseball players. It explains how to use most of the basic Solvent features.

What are the main features of Solvent?

Writing screen scrapers can be hard and tedious, that's why you need a tool to help you. Solvent lets you:

  • Interactively highlight parts of the page you wish to scrape, directly in your browser, and obtain the right XPaths for them
  • Inspect the DOM of the captured elements and assign variable name there
  • Automatically generate the javascript code that does the most common features, such as xpath results iterations
  • Choose from different screen scraping templates based on the type of page you are scraping (individual page, multi page, etc..)
  • Edit and execute the scraper code directly in the browser, making the development cycle fast and incremental
  • See the scraped results right in Piggy Bank even without installing the scraper first
  • Save and publish the scraper with the required metadata, so that others can discover it
  • Provide you with all the cheatsheets that you need for javascript, xpath, DOM, RDF and places where you can find RDF vocabularies

Where do I find other scrapers to learn from?

See the list of Piggy Bank scrapers available.

How can I help/complain/thank?

Solvent is an open source software and built around the spirit of open participation and collaboration.

There are several ways you can help:

  • Subscribe to our mailing lists to show your interest and give us feedback;
  • Report problems and ask for new features through our issue tracking system;
  • Send us patches or fixes to the code.
  • Edit this very wiki (don't worry, the wiki will notify us of changes)

If you are interested in Solvent's development, follow the Solvent development instructions.

Licensing & Legal Issues

Solvent is open source software and is licensed under the BSD license.

Credits

This software is maintained by the SIMILE project and in particular: