Re: model.addTag equivilant for xslt scraper - other piggybank issues

From: Brad Clements <>
Date: Sat, 28 May 2005 10:31:43 -0400

On 28 May 2005 at 11:06, Danny Ayers wrote:

> I'm curious about what you would need (not actually for simile, I'm
> doing some other XSLT to RDF/XML).
> Do you please have a link to a sample of the source data, and an
> example of the kind of statements you want to derive?

Well my use-case is pretty weak now, since I wrote a scraper in javascript.

Basically returns my most recent 32 bookmark
entries in rss format.

here's an example copy/pasted from FF (ignore the folding '-' chars)

<title>work horse handbook</title>
<rdf:li resource=""/>

Piggybank shows the datacoin by default on this page, but treats it as
generic "RSS", so it misses a lot of data, including the tags I've applied to
each entry.

What I wanted to do was

a) create web#Page entries for each of the RSS items

b) tag the tags I have and associate them with each new web#Page item.

That is, whitespace split dc:subject and call model.addTag for each, which
is what I did in javascript.

I couldn't find the spot in the source where piggybank parses the results of
an XSLTHarvester.. I thought maybe I could just stick in a collection of tags
in the containing Web#Page item.

However since tags are user specific (and hashed w/ email address) there'd
have to be special support in the piggybank parser to handle this properly.

So, I have 32 of 107 bookmarks imported into Piggybank and my thoughts 
1. If piggybank with publishing to a databank will replace my use of tagging, I'll need a way to specify a comment when I tag a page 
using piggybank. So, I'll have to modify that pop-up tag form and underlying 
java xpcom code to support adding a dc:description property.
2. There seems to be no way to edit any data in piggybank. Is this a 
planned feature? I should be able to edit any existnig property and add new 
I can change tags on an item, but I would like to be able to add new 
properties to existing items, not just additional tags.
3. What's the difference between XSLTHarvester and 
4. In an XSLTScreenScraper, can I use the document() method to traverse 
"child documents", just like utilities.processDocuments 
How about getLLsFromAddress using xslt? Perhaps utility and model could 
be exposed as exslt functions..
5. I can't debug javascript screen scrapers using Venkman. I can't set a 
breakpoint in them, and if Venkman is running and I pause it, piggybank 
gets a bit confused
6. What if I have two scrapers that can operate on a page, plus the default 
rss handler? Clicking on the datacoin should give me the option of picking a 
particular "generic handler" (like RSS) or a custom screen scraper.
Perhaps I would like to have a "generic scraper" that could work on many 
pages, I don't want to have to edit the .n3 file to list all URLs, I should be 
able to pick a candidate scraper from a list.
(like, right-click the data coin)
Guess that's enough for now.
Brad Clements,          (315)268-1000                          
AOL-IM or SKYPE: BKClements
We must come down from our heights, and leave our straight 
paths, for the byways and low places of life, if we would 
learn truths by strong contrasts; and in hovels, in forecastles, 
and among our own outcasts in foreign lands, see what has been 
wrought upon our fellow-creatures by accident, hardship, or vice. 
- Richard Henry Dana, Jr. 1836
Received on Sat May 28 2005 - 14:30:04 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT