Semantic Bank as document repository frontend

From: Rickard Öberg <>
Date: Mon, 31 Oct 2005 17:31:42 +0100


I have already had off-list discussions with Stefano about using
Semantic Bank for the purpose of indexing OpenOffice Calc spreadsheets
as RDF data (see for a
description of the end result). That worked very well, and I will
continue to tweak that solution into something that we will actually use
internally for all sorts of purposes.

My next idea, which I want to bounce off of you lot, is to take the same
idea but apply it to OpenOffice Writer documents. Essentially, create a
daemon that crawls a file server and extracts RDF data from the files
into the Semantic Bank (e.g. take the URL, creator, title, keywords as
tags, folder names as tags, etc.). Once that is done it would be
possible to browse and query the semantic bank for documents, and once
found a link to the local file system (which will be a file server) can
be presented. That would be really nice, and trivial to implement
considering that I already have the Calc implementation described in the
blog entry.

However, my next idea after that is to present the Semantic Bank using
WebDAV instead of as a web UI. Basically, the root of the repository
would contain folders representing the different types of objects. When
a type is selected (i.e. a folder is browsed into) the next level of
subfolders would be the list of predicates for that type (e.g.
"Creator", "Tag", "Title", etc.), and once selected it would show the
next level of subfolders as the possible values (e.g. as subfolder of
"Creator" you would have "Rickard" and as subfolder of "Created on" you
would have "2005-10-25"). For each level you would see only the next set
of possible folders to use as filter. It would probably do the same kind
of optimization as the web UI, i.e. if there's LOADS of documents then
first show "2005" as folder, then "October" as subfolder, and then "25".
  This could be done relatively dynamically.

Only when the magic folder "show X documents", which is always there and
where X is the number of documents in the result, is selected will the
actual documents be displayed, and when clicked the WebDAV servlet will
go to the URL of the document, and fetch and send it to the user.

Something like that. Basically this will provide a queryable and
browsable document repository which really doesn't care where the
documents are stored. They could be on hundreds of different servers for
all you know. It's only purpose is to aggregate them so that they can be
found easily. By using a "WebDAV to Explorer" tool like WebDrive the end
result will look just like a regular hard drive or file server to the user.

Whaddyathink? Is this something that has already been done? Would it be
useful? etc.

Any comments are welcome. I'll probably implement it in any case, if
only because it should be relatively easy to do so :-)

Received on Mon Oct 31 2005 - 16:26:14 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT