Contents

How is this different from bookmarking?

Bookmarking a Web page saves only the Web page's address and its title. And you can only bookmark whole Web pages, not little bits and pieces within them.

Web sites will eventually offer the features I want! Many sites have already started embedding Google Maps. Why do I need Piggy Bank?!

Until all Web sites offer all features that you want, we recommend using Piggy Bank to take matter into your own hands and make use of data the way you want. If the data is on your own computer, you have control over it. If the data is on Web sites, there is always a possibility that those Web sites don't agree with how you want to make use of their data. Moreover, once you have collected this data, you can mix it with other collected from other (possibly competing!) web sites.

If a Web site doesn't give me certain pieces of information (e.g., the phone number of the current residents of an apartment), there is no way Piggy Bank can get it. Web sites ultimately control how their information can be used. There is no way Piggy Bank can fight it.

Sure, but for everything else that Web sites do give, Piggy Bank can be of help.

Information on the Web is currently very messy and constantly changing. How can Piggy Bank and screen scrapers possibly keep up?

Piggy Bank is not the first software to take advantage of screen scraping HTML pages to extract information from them, but pretty much all the others work on the "syntax level" (using regular expressions to obtain information from the textual serialization of an HTML DOM). Piggy Bank has the advantage of having the browser HTML parser doing all the nasty job of parsing the HTML, so it works at the "model" level, not at the "syntax level", which means that it allows you to use "xpath" expressions and not "regular expressions".

Not only this is easier, but it's much more robust.

Another feature is that web sites are more and more using CSS stylesheets to separate content from presentation. This means that more and more pages contain id="" and class="" attributes that a scraper can use to 'hook' to a particular field, independently of its location in the DOM. Because such "hooks" are used by the web site itself and form a strong contract between the CSS designer and the HTML page editor, they change very rarely, making scraping a lot more resistant to web site changes.

Screen scrapers can contain dangerous code. How is Piggy Bank defending me against that?

Piggy Bank runs scrapers inside a sandbox, so it's not more dangerous than executing javascript code from web sites.

I upgraded Piggy Bank and there's a bunch of weird new types - what's with that?

There was a bug in release 3.1.0 of Piggy Bank that put some of Piggy Bank's internals in plain view. It's been corrected, you can delete anything from your personal Piggy Bank with a type name starting with "urn:simile"...