Re: Further testing... from Rickard Öberg on 2006-01-19 (stdin)

From: Rickard Öberg <rickard.oberg_at_senselogic.se>
Date: Thu, 19 Jan 2006 15:45:49 +0100

Stefano Mazzocchi wrote:
>> It looks like this is going to be very useful in a real-world
>> application in a way that I have seen no other data management
>> technology do before :-)
>
>
> Nice to hear that!
>
> Can you tell us the kind of queries that you are doing?

This is bordering on embarassing, but the most interesting queries for
now are dead simple, like "gimme all objects implementing the interface
Link", or "what pages have been published in the last week". My personal
favorite though is: "find all references to X, and replace them with Y".
Oh my god, does that help. Whenever I want to remove an object it is
very very common that I want to replace all current references to it
with something else, and with the RDF indexing I can get that trivially.

This was just a quick hack to try things out, so as we all get more
familiar with the mindset I suspect we'll be able to do all sorts of
nice things. My main complaint right now is that I have to choose a
query language, and I don't know which one will work best. SerQL in
Sesame seems to be the most "stable" one, but SPARQL in Sesame 2 would
perhaps be a better long-term choice. But it's mostly a hunch, more than
anything else.

Our current approach which uses only a BerkeleyDB-ish binary store (=no
querying at all), and Lucene for anything similar to querying (=which
works, but is not optimal, and very weak query language by comparison),
is not so strong. The main benefit, and one of the core reasons, of a
"simple" architecture like that is raw performance. Thus far we have
managed quite without more advanced querying, but there are more and
more administrative kinds of tasks and summaries that we need to be able
to do.

> do you feel the need for inferencing and truth management?

I honestly don't know enough about to know. Some simple things, like "A
childOf B, means B parentOf A" would be nice to have. Is that what
you're referring to? I have no idea how to use things like that, or how
performance heavy it would be to add rules for such things.

Most of the time insert and update speed is not so important, but for
batch imports it is actually very important (which may be contrary to
most peoples experience). We typically do "import archive" of websites
in our CMS, and the difference between an import phase that takes half
an hour and half a day is definitely important. So, how much stuff can I
add to the create/update procedures and still have it be quick, that's
one of my main concerns right now.

/Rickard
Received on Thu Jan 19 2006 - 14:45:27 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT