Dominique HazaŽl-Massieux wrote:
> Hi,
> With the arrival of piggy-bank making the Semantic Web part of the
> browsing experience, there is one thing that I would love to see happen:
> get piggy-bank understand GRDDL to harvest data on the Web.

Bummer, you ruined my surprise :-)

> GRDDL is a W3C CG Note [1] that defines how to map a given XHTML set of
> markup conventions and/or profile to a given RDF/XML interpretation,
> using XSLT as a way to get to this; GRDDL also defines a way for such a
> mechanism with generic XML, but I think already getting the XHTML part
> would be pretty nice for Piggy-Bank. (note that I mention XHTML, but
> GRDDL can be retro-fit to HTML when used through e.g. tidy)

I can hardly agree more. An XSLT stylesheet is the ideal implementation
of an XML2RDF bridge.... my concern is with non-well-formed HTML.. but
yeah, we could ship JTidy along with piggy-bank.... hmmm...

> There are already a few implementations of GRDDL out there, at least one
> in XSLT [2], one in python [3], and one in PHP [4]; even better, there
> is (since yesterday) a small test suite [6] - it currently only covers
> the XHTML aspect of GRDDL, but can easily be extended to cover the XML
> cases if needed.

Even just XHTML would be a huge start, I totally agree.

> GRDDL looks a lot like a possible use case for SIMILE's RDFizers [5],
> and actually even like a possible way to implement quickly several of
> their uses cases.


> I don't think I would be able to code the actual
> implementation of GRDDL in java, but I would certainly be interested in
> helping to implement it and getting it integrated in Piggy Bank.

Mozilla already ships with an XSLT engine built in. If we can't get to
it (or firefox doesn't contain it), we can totally ship a java xslt
engine alongside (but this would increase the distribution size and I
would try to avoid that, since piggy-bank is already huge on its own).

One of my requests was having the ability for piggy-bank to 'RDF-ize"
the xhtml-encoded information, but I have to agree that starting to push
for GRDDL would make it even more efficient as a way to push people to
be 'aware' of the semantic web issues, but without forcing them to
rewrite much of their stuff.

Also, a GRDDL stylesheet is, in fact, an RDF-izer, so I'm thinking: what
do you think about a GRDDL section in the RDFizer page on the SIMILE
server? I would be happy to give you commit access so that you can
maintain that yourself :-) that would be a good place to collect
existing stylesheets that might give a suggestion to the people that
want to publish their info on how to start.

Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
