Piggy Bank, timeline visualization, and scrapers

From: Rickard Öberg <rickard.oberg_at_senselogic.se>
Date: Tue, 18 Oct 2005 10:07:50 +0200

Hey

After having thought about building an RDF database/visualizer I happened upon Stefanos blog and found Piggy Bank, and after having played with it some I ditched all my own ideas and started writing scrapers. This stuff is just way way cool! Many thanks for providing it!

I have lots of questions etc. but to start off with there are two core things I need, and wanted to check with you how difficult it would be to fix. The majority of the data I want to have uses both locations and dates, specifically, I have lots and lots of historical data so it's kind of a "four dimensional" thing. So, for my purposes I want to combine the Google Maps visualizer with a scrollbar thingy where I can set a "window" of dates and then move that scrollbar and have the events for that window be shown. If I were to start adding that, how would I go about it? Is this something that anyone else could do with a relatively small time investment? In any case, I would imagine that such a thing would be kind of useful not only for my purposes, but for other things as well, e.g. earthquake visualization over a period of time (btw, I touched up Davids commented-out, but Google cached, USGS scraper so it works), RSS news items over a period of time, etc. Any feedback on how t
 o get something like that going would be MUCH appreciated.

Second, I am considering writing a scraper for Wikipedia, to bootstrap the historical database. Basically just going in there and scraping up all the events, deaths, births etc. that they have. However, instead of visiting each page individually and clicking the coin it would be preferred if I could point a scraper at it and just say "fetch website", in other words, the script needs to be able to fetch pages on its own based on generation of URL's. How easy would it be to do something like that? Has anyone done things like that (i.e. multipage scraping) before? Examples available?

I've read through the email archives, and it seems like this project is kind of in a starting position and that there are performance and technical issues to be fixed, but I am betting on that those will be resolved further down the line :-) I seriously like the approach, and will focus on writing scrapers for now.

Alright, good enough for an intro post I guess. Again, thanks for providing this great tool!

regards,
  Rickard
Received on Tue Oct 18 2005 - 08:02:34 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT