RE: scraper evolution from Prokopp, Christian on 2006-01-29 (stdin)

From: Prokopp, Christian <christian.prokopp_at_sap.com>
Date: Mon, 30 Jan 2006 07:44:55 +0800

Optional and mandatory fields sound like a very good idea. Instead of a
type checking the scraper developer could write a regular expression to
check the values (maybe you actually meant this with type checking :P ).
For the JS code a simple versioning system would be helpful (ideal would
be something like subversion or cvs but that seems a bit extreme at
least at this stage). If an update has to be invoked manually or
automatically is a question of taste and should be left to the user to
chose.

Cheers,
Christian

-----Original Message-----
From: David Huynh [mailto:dfhuynh_at_csail.mit.edu]
Sent: Saturday, 28 January 2006 8:45 AM
To: general_at_simile.mit.edu
Subject: Re: scraper evolution

Eric Miller wrote:

>
> On Jan 27, 2006, at 12:38 PM, David Huynh wrote:
>
>> I'm working on making scrapers more "declarative", thus easier to
>> write, easier to update and adapt, easier to find errors in them.
>> Update errors might thus be detectable automatically. And the users
>> themselves (not the original scraper authors) can try to update the
>> scrapers.
>
>
> Interesting! :) But I'm not quite sure how this would help the use
> case exactly. In the case below the scraper didn't die per se - minor

> HTML tweaks simply caused the scraper not to collect all of the RDF
> data that it originally able to gather. In this case, the user might

> not know there was an error and thus know to correct the scrapers.

The description of the scraper might also indicate which fields are
optional and which are mandatory. We can also enforce some type checking

on the values extracted.

David
Received on Sun Jan 29 2006 - 23:44:17 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT