More scrapers and some enhancement ideas from Steve Dunham on 2006-02-09 (stdin)

From: Steve Dunham <dunhamsteve_at_gmail.com>
Date: Thu, 9 Feb 2006 10:48:28 -0800

I'm a packrat and like to collect RDF copies of data I've come across
on the web. So I've written and posted a few more recipe scrapers, for
both foodandwine.com and 101cookbooks.com. (The latter is an XSLT
scraper, so it won't work for anyone else until PIGGYBANK-80 is
fixed.)

The newer scrapers break down the instructions into steps, where
appropriate. They also extract a little more metadata than previous
scrapers. I'm making up the ontology as I go along, on the assumption
that I can massage the data as it evolves. (I still need to write a
tool to do that, though.)

I also tweaked the sfgate restaurant scraper. (It seems I got the uri
wrong for location.) Since there is no "Delete All" in Semantic Bank,
I just uploaded the additional attributes without deleting the old
ones. (It would be nice to be able do a delete all on a query in my
own profile.)

The recipe thing would look a lot nicer if we could display instances
of rdf:Seq inline. (I think this could be handled easily once Fresnel
is integrated, so it might not be worth fixing at the moment.)

Also, for my own bank, I'd like a way to cache/store images. I've
considered storing it in the repository as base64 or raw image data
(as an attribute of the image URI), but I don't know how well Sesame
handles large data or binary data. Another option is to store the
SHA1SUM and keep the files on disk. (With their names set to the
SHA1SUM.) Any thoughts?

Steve
dunhamsteve_at_gmail.com
Received on Thu Feb 09 2006 - 18:47:29 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT