Re: [announcement] DSpace Scraper - Reloaded from Eric Miller on 2005-08-10 (stdin)

From: Eric Miller <em_at_w3.org>
Date: Wed, 10 Aug 2005 09:01:14 -0400

On Aug 10, 2005, at 12:48 AM, Stefano Mazzocchi wrote:

> I started to see if I could tranform the existing xslt scraper into
> a javascript and one and step after step, I turned it into a full-
> featured scraper that:
>
> 1) works on all dspace installations worldwide! (well, all that I
> tried)
> 2) works on both the simple and the complete item view
> 3) works on search results too! (so that you can have a facetted
> browsing experience of a dspace installation... well, at least for
> a limited amount of items)
>
> Point your piggy-bank-enabled firefox to
>
> http://simile.mit.edu/repository/piggy-bank/trunk/src/scrapers/
> screen-scrapers.n3
>
> and follow the instructions.
>
> Happy scraping (again).

Excellent! Well done!

The regular expressing matching you've introduced is more complex
than previous *scrapers. Might be worth mentioning this a bit more on
the list for those interested.

Also, I'm concerned about the use of 'http://www.ontoweb.org/ontology/
1#' namespace in this (and other) scrapers. Looking at the code, it
seems clear you're using this to type 'Publications'. I'd suggest we
either work with the ontoweb folks for making this available or use
another namespace for this term that resolves to something useful
(ideally one that has some sort of clear persistence policy).

If there are not readily available terms the Dspace community needs I
think we should work with them to find out what they are and simply
write these to the web. Other than the class 'Publication' is there
anything else you've come across in modeling the data?

--
eric miller                              http://www.w3.org/people/em/
semantic web activity lead               http://www.w3.org/2001/sw/
w3c world wide web consortium            http://www.w3.org/

Received on Wed Aug 10 2005 - 12:57:43 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT