Re: Semantic Bank as document repository frontend

From: Stefano Mazzocchi <>
Date: Wed, 02 Nov 2005 18:59:53 -0500

Sorry for the delayed response, time is a little crazy here in
preparation for the ISWC conference next week.

Rickard Öberg wrote:
> Hey
> I have already had off-list discussions with Stefano about using
> Semantic Bank for the purpose of indexing OpenOffice Calc spreadsheets
> as RDF data (see for a
> description of the end result). That worked very well, and I will
> continue to tweak that solution into something that we will actually use
> internally for all sorts of purposes.


> My next idea, which I want to bounce off of you lot, is to take the same
> idea but apply it to OpenOffice Writer documents. Essentially, create a
> daemon that crawls a file server and extracts RDF data from the files
> into the Semantic Bank (e.g. take the URL, creator, title, keywords as
> tags, folder names as tags, etc.). Once that is done it would be
> possible to browse and query the semantic bank for documents, and once
> found a link to the local file system (which will be a file server) can
> be presented. That would be really nice, and trivial to implement
> considering that I already have the Calc implementation described in the
> blog entry.

yes, and also because as David noted, we were already half-way thru to
crawling your own file system (a-la spotlight/beagle/google-desktop) and
RDFize it, the functionality is half baked in piggy bank already, but
not really documented/exposed because we didn't have time to polish it.

It's in our todo list anyway, along with email, which is my personal pet

> However, my next idea after that is to present the Semantic Bank using
> WebDAV instead of as a web UI. Basically, the root of the repository
> would contain folders representing the different types of objects. When
> a type is selected (i.e. a folder is browsed into) the next level of
> subfolders would be the list of predicates for that type (e.g.
> "Creator", "Tag", "Title", etc.), and once selected it would show the
> next level of subfolders as the possible values (e.g. as subfolder of
> "Creator" you would have "Rickard" and as subfolder of "Created on" you
> would have "2005-10-25"). For each level you would see only the next set
> of possible folders to use as filter. It would probably do the same kind
> of optimization as the web UI, i.e. if there's LOADS of documents then
> first show "2005" as folder, then "October" as subfolder, and then "25".
> This could be done relatively dynamically.

Interesting, that sounds like a pretty clever idea, actually: you save
the document in the root of the WebDAV folder and it gets automatically
saved and sorted out for you, based on its metadata.

I like it.

I've built other WebDAVapps in the past (mostly using Cocoon), they are
a lot of fun.

> Only when the magic folder "show X documents", which is always there and
> where X is the number of documents in the result, is selected will the
> actual documents be displayed, and when clicked the WebDAV servlet will
> go to the URL of the document, and fetch and send it to the user.

Well, WebDAV is an extension of HTTP, so you don't really the WebDAV
servlet to handle the GET request, only the one to handle the PROPFIND
and OPTIONS requests. And PUT/LOCK if you want to enable writing.

This minimal WebDAVapp written in cocoon gives you an example

> Something like that. Basically this will provide a queryable and
> browsable document repository which really doesn't care where the
> documents are stored. They could be on hundreds of different servers for
> all you know. It's only purpose is to aggregate them so that they can be
> found easily. By using a "WebDAV to Explorer" tool like WebDrive the end
> result will look just like a regular hard drive or file server to the user.

Yes, this is very cool.

In fact, I had a very similar idea for pictures:

  1) turn my bank account URL (or a sub URL) into a writeable WebDAV URL
  2) mount that using macosx webdavfs as a drive
  3) move my digital pictures into it
  4) have the server get the pictures
    4.1) extract the EXIF metadata from it
    4.2) RDFize it and store it in the triple store
    4.3) store the pictures locally and create thumbnails for it
  5) give me a faceted browser of my digital pictures

if we make #4 pluggable, the system could well work for any sort of
binary document.

> Whaddyathink? Is this something that has already been done? Would it be
> useful? etc.

I love it.

> Any comments are welcome. I'll probably implement it in any case, if
> only because it should be relatively easy to do so :-)

yeah, shouldn't be hard at all. If you do it, keep in mind the my
usecase above and keep #4 polymorphic and/or configurable and we should
be all set, as you can add different reactors depending on their MIME-type.

Oh, good luck for your new blog ;-) I'm cheering from the sides.

Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
Received on Wed Nov 02 2005 - 23:54:19 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT