Sesame Native Store OPS indexing (was Re: [update] piggybank performance profiling)

From: Vineet Sinha <vineet_at_csail.mit.edu>
Date: Tue, 13 Sep 2005 01:24:00 -0400

The summary
-----------
OPS indexing for the Sesame Native Store has been implemented. And it
works beautifully in Relo. Stefano should be looking at it in Piggy Bank.

Details
-------
Single Modified File:
http://simile.mit.edu/repository/relo/branches/sesame/src/org/openrdf/sesame/sailimpl/nativerdf/TripleStore.java

The working jar file
http://simile.mit.edu/repository/relo/trunk/edu.mit.csail.relo.store/lib/sesame-1.2.1-ops.jar

Implementation was as Arjohn had suggested. I did not see any bugs.
Adding statements seems to have added a 25% cost.

Beyond adding the second comparator, I also renamed previous
btree/file/filename variables to include 'spo' before them and made a
copy for 'ops'. triples.dat is now triples-1.dat and triples-2.dat.

The other issue is a Jeen mentioned, the large file size for
transmission. The best solution could be to not require it for
transmission and build the second index automatically (in fact this
should also increase the add performance, relying on the spo index until
the ops index is ready).

There were no unit tests, but tests on Relo should have been good.

I can send patch, after we fix any issues. Let me know if you have
questions.

Arjohn, thanks for your help!

Vineet


Arjohn Kampman wrote:
> TripleStore currently has one on-disk B-Tree that it stores in a file
> with the name "triples.dat". The triples are stored as arrays of 13
> bytes: 3 x 4 bytes for subject, predicate and object, 1 byte for
> additional flags. The last byte is currently used to store a flag that
> indicates whether the triple is explicit. Because there are no
> inferencers for the native sail yet, this flag is always true.
>
> All of the above is reusable as-is for additional indexes. All that
> matters is how the B-Tree compares/orders the values that are stored
> in it. This can be controlled by a BTreeValueComparator. The current
> SPO-index uses a comparator that compares the first 12 bytes in their
> original order. An OPS-index can be realized by first comparing bytes
> 8-11, then bytes 4-7 and finally bytes 0-3. It's as simple as that!
> (fingers crossed, we never tried this before :-) )
Received on Tue Sep 13 2005 - 05:19:35 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT