Piggy Bank (2.1.1) scraper issues from Steve Dunham on 2005-11-14 (stdin)

From: Steve Dunham <dunhamsteve_at_gmail.com>
Date: Mon, 14 Nov 2005 11:02:21 -0800

I'm trying to write some scrapers for Piggy Bank and have been having a few
issues. They are all described by:
http://www.cse.msu.edu/~dunham/scrapers/scrapers.n3
<http://www.cse.msu.edu/%7Edunham/scrapers/scrapers.n3>

In short my issues are:
1. The "model" in the scraper doesn't seem to handle literals with embedded
linefeeds
and quotes.
2. Related to #1, when piggy bank gets .n3 that it can't parse, it says "no
typed data
found". It took me a while to figure out it was a parse error.
3. The second two scrapers (the google movie and epicurious recipe ones)
collect data
in solvent, but the cookie just spins when I try to run them from piggy
bank. I have
no idea why.
4. I'd really like to be able to create BNodes. (Why? e.g. so I can smush
foaf:Person
records later one without dropping a uri.)
5. Unrelated to scraping: When I hit O'Reilly's DOAP feeds, firefox tries to
download rather than piggy-bank the data. E.g.
http://ruby.codezoo.com/cs/user/run/component/5270?x-r=doap

They're sending it as Content-Type: application/rdf+xml - I don't know if
that's the problem or not.

I'd appreciate it if somebody could take a look at the google movie scraper.
(It only does
theatre location at the moment, until I figure out the "spinning" issue.)
I'd like to know why it never makes it to piggy bank when I click on the
coin. (The coin just keeps spinning.)

It would also be nice if Literals with embedded CRs and quotes could be
handled in the future. It doesn't look like the N3 parser that piggy bank
uses groks triple quoting, so it may have to be switched to RDF/XML.

(N.B. The sourceforge scraper is incomplete - it doesn't add the CVS
repository yet, I'd wanted to use a BNode, but I'll probably break down and
give it a name.)

Thanks for all the work, both semantic bank and piggy bank are impressive
applications,
Steve Dunham
dunhamsteve_at_gmail.com
Received on Mon Nov 14 2005 - 18:56:21 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT