You’re meeting up with a friend for lunch before working together on a business plan. You’d like to locate a restaurant serving some particular cuisine that’s close to a coffee shop of your favorite chain that you know offers free wireless service. Unfortunately, although you can locate restaurants by cuisine using one Web site and coffee shops in another, there is no single Web site by which you can map both restaurants and coffee shops of your favorite chain together.
Your family is moving to a new town and you’d like to rent an apartment close to a subway station, a decent grocery store, and a good elementary school. Although many apartment-for-rent postings claim that the properties in question are close to public transportation, shopping centres, etc., you would like to verify that for yourself. Unfortunately, the addresses for apartments, grocery stores, and elementary schools come from different Web sites and it is not possible to view them together on a single map.
Being an amateur painter, you would also want to find nearby arts supplies stores. Safety conscious, you would like to stay away from crime hotspots but nearby hospitals. While at it, you decide to include the locations of your friends and relatives’ homes. The more you add the information that matters to you, the less likely you will be able to find a Web site that caters to your needs.
You’re conducting research into a new technology for your company. You have found information on this technology in many places: scientific publications from several publication archive Web sites, news articles from several news sites, and postings from several blogs. You would like to sort by date all the information records you have found to learn how this technology has evolved over time and how it has gained acceptance by the industry. Unfortunately, the bookmarks you use to keep track of your findings do not carry dates, or for that matter, any attribute such as authors, publishing venues, etc.
These scenarios point to a limitation of the Web. Although your Web browser can fetch information for you at lightning speed, the information seems to be stuck at the various Web sites where it originates. It is not easy to save such information in detail without copying it over to or re-typing it into different software applications. (Bookmarks only save the addresses and titles of whole Web pages, not the details of individual items within Web pages.) It is not easy to merge information from several sites without untangling it from each individual site’s formatting and then recreating a unifying presentation.
The fact that information is stuck at its original Web sites is even more constraining when those sites do not let you browse and search their information according to your needs. A property-for-rent site might not let you browse its listing by the amenities available. A coffee shop chain’s Web site might not offer you a map of all its stores.
Sure, you say, there are already many new Web sites that embed Google Maps to display information on maps. For instance, HousingMaps puts housing ads from Craigslist onto Google Maps. However, instead of relying on Craigslist to offer a map view, you now rely on the existence of HousingMaps. What if you want to plot information exotic enough that no one with enough Web programming skills is interested or has the resources to put up a Web site? What if you want to plot information in several domains together? Will there be someone else with the same need for such a unique combination of information who can put up a Web site? While these Google Maps embedding sites are innovative, they simply move the problem from one Web site to another.
The Semantic Web initiative defines standards for the representation of data on the Web that allows data to be mixed more easily and re-purposed for each person’s needs regardless of where the information comes from or how it is originally shown.
Piggy Bank is our attempt to enrich your Web experience in the spirit of the Semantic Web, thereby giving you a taste of what the Semantic Web has to offer, and doing so without requiring you to leave the comfort zone of your existing Web browser.
How It Works
The Semantic Web initiative has defined a standard data model called RDF in which information on the Web can be recorded. Using this W3C standard for representing data, one can describe information independent of formatting. And such data represented in RDF (e.g. “pure” information) is much more conducive to mixing, reusing and re-purposing.
Piggy Bank is an extension to the Firefox Web browser that extracts information from existing Web pages and stores it in RDF. If a Web page already links to RDF information, extraction simply means retrieving that information. Otherwise, Piggy Bank employs custom software code that untangles the “pure” information from the Web page’ formatting.
Having extracted the “pure” information and stored it on your computer, Piggy Bank can now apply its own user interface to let you browse through that information independent of the original Web sites. For example, Piggy Bank can call upon Google Maps to display geographical information even if the original Web sites do not offer cartographic views of their data.
Furthermore, by storing “pure” information from different Web sites in the same data model, Piggy Bank can offer a unified view on the “pure” information regardless of its many origins.
The piece of software code that Piggy Bank uses to “purify” information within a Web page is called a screen scraper. Different screen scrapers are made for different Web pages. Piggy Bank supports an easy way to install screen scrapers, so that getting better use of a Web page’s information is just a few clicks away.
Go back to Piggy Bank