Re: [RT] Learning from Greasemonkey + Platypus

From: Eric Miller <>
Date: Wed, 3 Aug 2005 20:12:51 -0400

On Aug 3, 2005, at 7:20 PM, Stefano Mazzocchi wrote:

> Alf Eaton wrote:
>> On 03 Aug 2005, at 23:46, Stefano Mazzocchi wrote:
>>> Also, once we have the content we want and the URL and the xpath
>>> (should we say xpointers?) locations, how do we turm them into
>>> RDF statements? how do we guide them to select the available
>>> ontologies or empower them to make their own?
>> I have a slightly different question, but on a similar topic.
>> The scenario is
>> i) A document contains data marked up using a standard
>> 'microformat' format, eg hReview [1]
>> ii) The document also contains a link to a profile describing
>> that format, which will recommend an XSLT stylesheet for
>> processing the document [2]
>> iii) A Greasemonkey (or Piggy Bank) script fetches the XSLT
>> stylesheet, applies it to the document, and produces a fragment
>> of RDF/XML [eg 3]
>> ... then how can that RDF/XML fragment get imported into Piggy
>> Bank? Is there a utility function in Piggy Bank that a
>> Greasemonkey script can call to say 'import this chunk of RDF/XML'?
>> I think I remember someone asking a while ago on the list if the
>> data coin could show up when a page of pure RDF/XML was loaded -
>> that would be applicable here too, as the XML could be loaded
>> into a new page using a data: URI.
>> Admittedly this might all be unnecessary, if the XSLT
>> transformation was done by passing the document to a web service
>> rather than using Javascript, but I still think it could be
>> useful (for example when extracting data from a page that's
>> behind a subscription barrier).
>> [1]
>> [2]
>> grddl/
>> [3]
> There is *already* support for this inside PB.
> See for an example.

And ...

How to Write Screen Scrapers

A screen scraper in Piggy Bank is a piece of software code that
extracts “pure” information from within a web page’s content (and
perhaps from related web pages). Screen scrapers can be implemented
as XSL templates or in Javascript.

for additional details.

eric miller                    
semantic web activity lead     
w3c world wide web consortium  
ps: and DavidH... please write that 'scraper revolution' email thats  
been on your mind :)
Received on Thu Aug 04 2005 - 00:09:26 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT