Re: scraper evolution

From: Steve Dunham <dunhamsteve_at_gmail.com>
Date: Fri, 27 Jan 2006 09:18:24 -0800

On 1/27/06, Eric Miller <em_at_w3.org> wrote:
> The HTML pages harvested using an Open Worldcat scraper [1] changed and as a
> consequence the scraper broke. To be clear, the scraper when evoked didn't
> stop working per se, but rather it didn't glean all of the relevant RDF that
> it did originally. I've updated the scraper accordingly, but its unclear to
> me the best way to propagate these changes to others who might be using the
> scraper.
>
> I can think of several possible options all of which have various pros /
> cons
>
> 1) do nothing ... if folks realize its broken they'll look for an update
> 2) real time auto-update ... every time scraper is invoked it checks to see
> if a new version is available
> 3) periodically update ... check for updates nightly, monthly, etc. and then
> offers the user some sort of notification to update
>
> I'm inclined to suggest 3, but curious as to others thoughts who might have
> been able to spend more time thinking about this than I have :)

Two variants on the above:
 a) Check to see if there is a new version on invocation, if it hasn't
checked after X amount of time. (No use bothering the user until the
scraper is invoked.)
 b) keep an MD5 of the user-approved javascript, but load it from the
source (through the browser's cache) every time the scraper is
invoked. Prompt the user if it's changed.

(Note that these don't catch the case where the RDF changes.)
Received on Fri Jan 27 2006 - 17:17:48 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT