Inferencing in Triple Stores [was Re: Further testing...]

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Fri, 20 Jan 2006 13:32:27 -0500

Rickard Öberg wrote:
> Rickard Öberg wrote:
>>> do you feel the need for inferencing and truth management?
>
> Actually, thinking about this some more, inferencing might actually be a
> very big thing for us (that is, if it means what I think it means). It
> is VERY common for us to have a parent/child relationship between
> objects and then have properties for a parent that is propagated to
> child objects. Metadata and permissions are two key examples of that. If
> a parent has a metadata value set I do not want to set it on all
> children. Instead a child should get that metadata because its parent
> has it. Same thing with permissions.
>
> Do you know if I can declare such rules with Sesame? Preferably I want
> to make queries to Sesame that can then be answered either by the native
> store or by using inferencing rules. But how costly would it be to allow
> queries to be answered by inferencing? Things like that would be very
> valuable to know. But if it works, wow, that is a BIGGIE in terms of
> value-add.

Personally, I think that one of the biggest selling points of a triple
store compared to a "regular" database technology are two:

  1) data first (instead of schema first)

  2) inferencing (declarative, rule base data generation/expansion)

even if both create huge issues in terms of design, especially when
coupled with lots and lots of data.

In our previous prototypes, we solved the inferencing problem by
applying brute force forward chaining, that is:

  a) load all the statements in memory
  b) look for all "?x owl:sameAs ?y" and then copy all properties of ?x
to ?y and all properties to ?y to ?x [we also did subProperty, subClass
and inverseFunctionalProperties]
  c) add those statements to another model
  d) search the union of the two models

That subset of RDFS/OWL is normally and colloquially called "OWL tiny"
and many of us believe to be enough for most operations as normally the
data we receive is not detailed enough to allow any sort of more
exhaustive description logic inferencing on it.

We were not the first to use such a solution (Sesame's Jeen and Arjohn
wrote a paper about a very similar approach) but we are reaching the
point where we need to scale this to a few hundreds of million
statements and problems start to pop up.

Sesame 1.x supports basic equivalence inferencing, but only with their
memory and relational stores, the native store that we use doesn't
support those. We are currently looking for ways to solve this but we
haven't found a solution yet.

In a perfect world, the triple store should do the inferencing, work as
a black box and maintain references between inferred statements that
allows a form of truth management.

[truth management is when you remove one statement and the system is
able to remove also the statements that requires that in order to be
inferred]

It's already not trivial to do truth management with a small number of
statements (say a few thousands), do it for a few hundred million
without sacrificing runtime query performance becomes extremely painful.

Now, disk is cheap and our stream of data is relatively stable (means
that not a lot of new data comes in compared to the data already stored
and not a lot of statements get removed), so one could think about a
massive precomputation and multiple indexing strategy.

The easiest way to do truth management is to keep the inferred statement
separate from the original one and then throw away the inferred
statements and recalculate those when you find it necessary.

A much nicer way would be to be able to keep all the statements that
were not impacted by the change in the reasoning layer.

Jeen, Arjohn, can you tell us your plans/status in regard to inferencing
for Sesame2?

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Fri Jan 20 2006 - 18:32:00 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT