Re: scraper evolution from David Huynh on 2006-01-27 (stdin)

From: David Huynh <dfhuynh_at_csail.mit.edu>
Date: Fri, 27 Jan 2006 12:38:35 -0500

I'm working on making scrapers more "declarative", thus easier to write,
easier to update and adapt, easier to find errors in them. Update errors
might thus be detectable automatically. And the users themselves (not
the original scraper authors) can try to update the scrapers.

We should also try to detect whether a scraper is "safe" automatically.

David

Eric Miller wrote:

> The HTML pages harvested using an Open Worldcat scraper [1] changed
> and as a consequence the scraper broke. To be clear, the scraper when
> evoked didn't stop working per se, but rather it didn't glean all of
> the relevant RDF that it did originally. I've updated the scraper
> accordingly, but its unclear to me the best way to propagate these
> changes to others who might be using the scraper.
>
> I can think of several possible options all of which have various pros
> / cons
>
> 1) do nothing ... if folks realize its broken they'll look for an update
> 2) real time auto-update ... every time scraper is invoked it checks
> to see if a new version is available
> 3) periodically update ... check for updates nightly, monthly, etc.
> and then offers the user some sort of notification to update
>
> I'm inclined to suggest 3, but curious as to others thoughts who might
> have been able to spend more time thinking about this than I have :)
>
> [1] http://potlach.org/2005/10/scrapers/
>
> --
> eric miller http://www.w3.org/people/em/
> semantic web activity lead http://www.w3.org/2001/sw/
> w3c world wide web consortium http://www.w3.org/
>
>
Received on Fri Jan 27 2006 - 17:37:58 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT