Re: model.addTag equivilant for xslt scraper - other piggybank issues from Ryan Lee on 2005-05-31 (stdin)

From: Ryan Lee <ryanlee_at_w3.org>
Date: Tue, 31 May 2005 12:28:41 -0400

Hi Brad,

Brad Clements wrote:
> So, I have 32 of 107 bookmarks imported into Piggybank and my thoughts
> are:
>
> 1. If piggybank with publishing to a databank will replace my use of
> del.icio.us tagging, I'll need a way to specify a comment when I tag a page
> using piggybank. So, I'll have to modify that pop-up tag form and underlying
> java xpcom code to support adding a dc:description property.

Submit a patch :) Or add it as a feature request in our issues tracker

http://simile.mit.edu/issues/

> 2. There seems to be no way to edit any data in piggybank. Is this a
> planned feature? I should be able to edit any existnig property and add new
> ones.
>
> I can change tags on an item, but I would like to be able to add new
> properties to existing items, not just additional tags.

Editing is on the horizon. We anticipate it will be a challenge to do well.

> 3. What's the difference between XSLTHarvester and
> XSLTScreenScraper?

I think at some point someone switched terminology from using
'Harvester' to 'ScreenScraper.'

> 4. In an XSLTScreenScraper, can I use the document() method to traverse
> "child documents", just like utilities.processDocuments
>
> How about getLLsFromAddress using xslt? Perhaps utility and model could
> be exposed as exslt functions..

I don't know. I don't expect we've blocked that function.

Providing the same functionality across ScreenScraper's sounds like a
good idea (issues list / patch :).

> 5. I can't debug javascript screen scrapers using Venkman. I can't set a
> breakpoint in them, and if Venkman is running and I pause it, piggybank
> gets a bit confused

Not being a Venkman user, I can't provide any insight here.

> 6. What if I have two scrapers that can operate on a page, plus the default
> rss handler? Clicking on the datacoin should give me the option of picking a
> particular "generic handler" (like RSS) or a custom screen scraper.
>
> Perhaps I would like to have a "generic scraper" that could work on many
> pages, I don't want to have to edit the .n3 file to list all URLs, I should be
> able to pick a candidate scraper from a list.
>
> (like, right-click the data coin)

I think we hadn't anticipated scraping to be so popular as to merit
functions for picking between them...

I don't know how ultimately successful a generic scraper would be, but I
suppose you could make the URL list '^http://.*$' or something for now.

Good idea, maybe along with right-click selection the default could be
to run several scrapers and differentiate the results by which scraper
produced what.

-- 
Ryan Lee                 ryanlee_at_w3.org
W3C Research Engineer    +1.617.253.5327
http://simile.mit.edu/

Received on Tue May 31 2005 - 16:27:03 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT