Solvent is a Firefox extension that helps you write screen scrapers for Piggy Bank.
Why do I need screen scrapers?
Piggy Bank needs web pages to embed information in a format that it can understand. This format is called RDF (Resource Description Framework) and its main advantage is that makes machine processing a lot easier. Unfortunately, at these very early stages, not many web pages embed or link to such "purer" RDF information. Piggy Bank, however, is capable of executing a particular screen scraper on particular pages in order to "extract" the information it needs.
In short, screen scrapers allow you to turn a regular web page into a regular web page plus semantic data, and thus frees the data from the page/site that contains it.
How do I use it?
Watch a screencast of Solvent scraping the location of Starbucks coffee shops in Cambridge, MA and then use Piggy Bank to show the scraped data on a map.
There is another tutorial about using Solvent to scrape web pages containing data about baseball players. It explains how to use most of the basic Solvent features.
What are the main features of Solvent?
Writing screen scrapers can be hard and tedious, that's why you need a tool to help you. Solvent lets you:
- Interactively highlight parts of the page you wish to scrape, directly in your browser, and obtain the right XPaths for them
- Inspect the DOM of the captured elements and assign variable name there
- Choose from different screen scraping templates based on the type of page you are scraping (individual page, multi page, etc..)
- Edit and execute the scraper code directly in the browser, making the development cycle fast and incremental
- See the scraped results right in Piggy Bank even without installing the scraper first
- Save and publish the scraper with the required metadata, so that others can discover it
Where do I find other scrapers to learn from?
How can I help/complain/thank?
Solvent is an open source software and built around the spirit of open participation and collaboration.
There are several ways you can help:
- Subscribe to our mailing lists to show your interest and give us feedback;
- Report problems and ask for new features through our issue tracking system;
- Send us patches or fixes to the code.
- Edit this very wiki (don't worry, the wiki will notify us of changes)
If you are interested in Solvent's development, follow the Solvent development instructions.
Licensing & Legal Issues
Solvent is open source software and is licensed under the BSD license.
This software is maintained by the SIMILE project and in particular: