RDF 101 [was Re: introduction and questions] from Stefano Mazzocchi on 2005-04-15 (stdin)

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Fri, 15 Apr 2005 10:22:35 -0400

Erik,

welcome!

Erik Hatcher wrote:
> <introduction>A little introduction for my first e-mail to this
> wonderful group... I work for ARP (Applied Research in Patacriticism)
> at the University of Virginia (http://www.patacriticism.org) building
> tools for digital library archives. The Rossetti Archive search
> feature is my latest accomplishment:
> http://www.rossettiarchive.org/rose/ - search for "blessed damozel" to
> find one of Rossetti's most famous works - massive cleanup of URL's and
> pre-generation of static content has also been part of my efforts. As
> the co-author of Lucene in Action, Lucene is my hammer and I use it
> even when it doesn't make sense to :) The book has a site that
> leverages Lucene: http://www.lucenebook.com
>
> My current project for ARP is to build a system we're calling Collex,
> which will allow users to collect "objects" from digital archives into
> collections and then build and publish elegant exhibits from the
> objects they've collected.
>
> I had the pleasure of catching up with my friend Stefano last weekend
> while I was in Boston presenting at the ACM meeting and NFJS symposium.
> His passion for SIMILE and the overlap with what we're doing has
> pulled me in and I'm convinced that SIMILE and the concepts it works
> with are the right places to be. There is great synergy of the SIMILE
> projects and what I'm doing at ARP.</introduction>

Lovely introduction, thanks a lot.

> Now on to my questions....
>
> First, I'm utterly clueless about RDF.

That's totally fine. We do not expect our users to know RDF inside out,
and we are willing to help to get them up to speed.

> What is the simplest RDF file I
> can put into Longwell for it to be browsable?

Well, first of all, your RDF works better with Longwell (at least the
1.x generation) if your resources are 'identifiable' and 'typed'.

This means that:

  1) each 'thing' you want to be able to refer to has to have a unique
identifier (URI)

  2) each 'thing' you want to consider in isolation needs to have a type

the second means that you need a 'rdf:type' statement, or, using
RDF/XML, you need to say something like

  <blah:Blah rdf:about="http://your.host.com/uri/3809480">
   ...

instead of

  <rdf:Description rdf:about="http://your.host.com/uri/3809480">
   ...

Note that the RDF/XML syntax is rather weird as it has special meanings,
for example

  <blah:Blah
     xmlns:blah="http://blah.com/ns/blah#"
     rdf:about="http://your.host.com/uri/3809480">
    ...

is completely equivalent to

  <rdf:Description rdf:about="http://your.host.com/uri/3809480">
    <rdf:type rdf:resource="http://blah.com/ns/blah#Blah"/>
    ...

[this creates all sort of problems in RDF canonicalizations and some
people hate it and some love it, but hey, RDF/XML is even older than the
XML namespaces spec and it feels kinda pre-hystoric to me at times, but
it grows on you after a few months]

> I'm after something
> concrete and simple to begin with. I'm drowning in a sea of
> abstractions, and need something concrete to keep me afloat.

eheh, I know that feeling :-)

> I've
> tried a few simple XSLT experiments with a sample of our data and have
> not been successful in making Longwell happy - I see no facets to
> browse. I do, however, see lots of great stuff when I drop in the
> bibliography or the other samples. Those examples are a bit too much
> for me to start with though - it seems I should be able to expose
> things to Longwell with only Dublin Core metadata to start with.

Ok.

> For sake of example, Rossetti's Blessed Damozel is shown in this HTML
> page: http://www.rossettiarchive.org/docs/1-1847.s244.raw.html and the
> source XML (a custom "schema") is here:
> http://www.rossettiarchive.org/docs/1-1847.s244.raw.xml . The root
> element has some metadata that would be fun to start with, such as
> archivetype, metatype, id, and workcode, all of which is meaningful
> within our domain.
>
> Also, pointers to tutorials or jump starts on how to wrap my head
> around RDF would be most helpful.

I'm sure you've seen my "No-nonsense Guide to Semantic Web Specs for XML
People"

  part 1 -> http://www.betaversion.org/~stefano/linotype/news/57/
  part 2 -> http://www.betaversion.org/~stefano/linotype/news/78/

but in case you haven't, those are relatively introductory and show you
the power of semantic web technologies at the surface.

If you want to get a little deeper, it's probably easier to keep asking
specific questions here as soon as you encounter a roadblock.

NOTE: longwell1 is a 'configurable' browser, means that you have to
change config.n3 to suit your needs or the browser won't show anythign
at all.

Welkin and Longwell2 do not need that finetuning, you can throw whatever
RDF at them and they will adjust to the data.

NOTE2: both Longwell 2.0-dev Welkin 1.1-dev are not yet released, so
download and build from SVN if you want to experience them.

  longwell 2.0-dev ->
http://simile.mit.edu/repository/longwell/branches/2.0/

  welkin 1.1-dev -> http://simile.mit.edu/repository/welkin/trunk/

both are fairly stable just not cleaned up for a release (we are
planning to do that RSN!)

> A few other minor things:
>
> - Gadget is cool! Please make its build.xml file use "package" as
> its default target.

done

> - Longwell2 - How do I get it to work with a sample dataset? I
> tried pointing longwell.properties the data directory of my Longwell
> TRUNK area, but it did not work.

you have to run it like

  ./longwell.sh longwell.properties datadir

and it will load all the *.rdf, *.n3, *.rdfs, *.owl files found
recursively in the datadir.

> - java2rdf in RDFizers, I get this when pointing to Lucene's
> compiled class directory:
> $ java -jar java2rdf.jar ~/dev/lucene/build/classes/java
> Processing folder: /Users/erik/dev/lucene/build/classes/java
> Processing folder: /Users/erik/dev/lucene/build/classes/java/org
> Processing folder: /Users/erik/dev/lucene/build/classes/java/org/apache
> Processing folder:
> /Users/erik/dev/lucene/build/classes/java/org/apache/lucene
> Processing folder:
> /Users/erik/dev/lucene/build/classes/java/org/apache/lucene/analysis
> Processing class:
> /Users/erik/dev/lucene/build/classes/java/org/apache/lucene/analysis/
> Analyzer.class
> Exception in thread "main" java.lang.StackOverflowError

Damn. Added to the mile-long todo list.

> - Welkin - well done! It'll make more sense to me when I
> understand RDF a bit more, but it's a nice visualization.

Did you try it from the trunk or from the webstart release?

> - Charon - this looks like something we could really leverage with
> Collex - allowing folks that have legacy low-tech archives to be
> "collectable" somehow. This may be a place of collaboration for us.

Awesome! Charon is based on cocoon, so 99.9% of the complexity is
already dealt for you by it. All you need to do is to write a few XSLT
stylesheets, take a look at

http://simile.mit.edu/repository/charon/trunk/stylesheets/rdfize.xslt

to see the 'core' action. This is the rdfizer targetted for a dspace
site. Charon was built with dspace in mind, but it's relatively easy to
modify it to be able to support multiple sites at the same time, would
that need emerge. I'd be happy to help out directly with that, also
because I would love Charon and Piggy-Bank to share XSLT RDFizers (a-la
GRDDL)

   http://www.w3.org/2004/01/rdxh/spec

> - Lucene - Lucene 1.4 has been out for a while. I notice that the
> SIMILE projects are using 1.3. I recommend upgrading to 1.4.3. There
> are some great new features in it, such as term vector support and
> sorting of hits. As well as co-author on Lucene in Action, I'm a
> Lucene committer and if you folks run into any Lucene issues let me
> know and I'd be happy to help.

Patches are welcome ;-)

> Thanks and I look forward to lots of fun with the SIMILE tools and
> learning lots from this brilliant community.

And I'm happy you are here with us!

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------

Received on Fri Apr 15 2005 - 14:21:56 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT