Re: [RT] Learning from Greasemonkey + Platypus

From: Eric Miller <em_at_w3.org>
Date: Wed, 3 Aug 2005 20:12:51 -0400

On Aug 3, 2005, at 7:20 PM, Stefano Mazzocchi wrote:

> Alf Eaton wrote:
>
>> On 03 Aug 2005, at 23:46, Stefano Mazzocchi wrote:
>>
>>> Also, once we have the content we want and the URL and the xpath
>>> (should we say xpointers?) locations, how do we turm them into
>>> RDF statements? how do we guide them to select the available
>>> ontologies or empower them to make their own?
>>>
>> I have a slightly different question, but on a similar topic.
>> The scenario is
>> i) A document contains data marked up using a standard
>> 'microformat' format, eg hReview [1]
>> ii) The document also contains a link to a profile describing
>> that format, which will recommend an XSLT stylesheet for
>> processing the document [2]
>> iii) A Greasemonkey (or Piggy Bank) script fetches the XSLT
>> stylesheet, applies it to the document, and produces a fragment
>> of RDF/XML [eg 3]
>> ... then how can that RDF/XML fragment get imported into Piggy
>> Bank? Is there a utility function in Piggy Bank that a
>> Greasemonkey script can call to say 'import this chunk of RDF/XML'?
>> I think I remember someone asking a while ago on the list if the
>> data coin could show up when a page of pure RDF/XML was loaded -
>> that would be applicable here too, as the XML could be loaded
>> into a new page using a data: URI.
>> Admittedly this might all be unnecessary, if the XSLT
>> transformation was done by passing the document to a web service
>> rather than using Javascript, but I still think it could be
>> useful (for example when extracting data from a page that's
>> behind a subscription barrier).
>> [1] http://microformats.org/wiki/hcalendar
>> [2] http://dannyayers.com/archives/2005/08/01/microformats-on-the-
>> grddl/
>> [3] http://alf.hubmed.org/hreviewprocessor.user.js
>>
>
> There is *already* support for this inside PB.
>
> See http://tinyurl.com/bqvxu for an example.

And ...

[[
How to Write Screen Scrapers

A screen scraper in Piggy Bank is a piece of software code that
extracts “pure” information from within a web page’s content (and
perhaps from related web pages). Screen scrapers can be implemented
as XSL templates or in Javascript.
]]
- http://simile.mit.edu/piggy-bank/screen-scrapers-howto.html

for additional details.

--
eric miller                              http://www.w3.org/people/em/
semantic web activity lead               http://www.w3.org/2001/sw/
w3c world wide web consortium            http://www.w3.org/
ps: and DavidH... please write that 'scraper revolution' email thats  
been on your mind :)
Received on Thu Aug 04 2005 - 00:09:26 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT