RE: [RT] Moving Piggy Bank forward... from Matthew Cockerill on 2005-07-22 (stdin)

From: Matthew Cockerill <matt_at_biomedcentral.com>
Date: Fri, 22 Jul 2005 13:58:38 +0100

I wonder if the opensource Berkeley DB Java Edition might have potential as some part of Piggy Bank's persistence solution?
http://www.sleepycat.com/products/je.shtml

There is a hint here:
http://jena.sourceforge.net/contrib/contributions.html
http://www.hpl.hp.com/techreports/2003/HPL-2003-266.pdf
that some work on RDF persistence with Berkeley DB java edition has been done (using Jena rather than Sesame) by Chris Dollin, and that it performs pretty well compared to relational databases.

Matt
==
Matthew Cockerill
Director of Operations
BioMed Central ( http://www.biomedcentral.com/ )

> -----Original Message-----
> From: Jeen Broekstra [mailto:jeen_at_aduna.biz]
> Sent: 22 July 2005 13:42
> To: general_at_simile.mit.edu
> Subject: Re: [RT] Moving Piggy Bank forward...
>
>
> Stefano Mazzocchi wrote:
>
> [snip]
>
> > 3) use the Sesame Memory store instead of the native one (which
> > supports RDFSchema entailment and hopefully OWL-tiny with some
> > custom rules)
> >
> > [Jeen, can you tell us more about that?]
>
> Sure, what would you like to know? :)
>
> Here are some trivia: Sesame's in-memory store uses an internal java
> object model for remembering nodes (uris, bnodes, literals) and
> statements. There are three main hashmap indexes, one for URIs, one
> for bNodes, one for literals. Each RDF node links to a list of
> statements in which it is used as a subject/predicate/object. URIs use
> shared representations for their namespace to minimize memory
> consumption for string. This makes querying, especially on triple
> patterns for which at least one variable is fixed, very fast.
>
> Unfortunately, there is currently no production-ready custom
> inferencer for the in-memory store (the available custom inferencer
> only operates on MySQL databases). There is some raw code on a new
> custom inferencer that uses SeRQL queries as the rule format, but we
> are short a number of hands to make that thing work properly. An
> alternative is perhaps OntoTexts' OWLIM package, which is an adapted
> in-memory store that can do simple OWL-Lite entailment.
>
> > PROs:
> >
> > - we get basic inferencing on equivalences and subclassing
>
> As an aside: basic inferencing is still on the ToDo list for the
> native store. We are also awaiting a number of third party
> contributions which will hopefully significantly improve native store
> performance (better indexing). I'll keep you informed on progress if
> you want.
>
> > - it's considerably faster
> >
> > CONs:
> >
> > - memory consumption grows linearly with the amount of data stored
> > (or worse?)
>
> About linear. Roughly 170 bytes per triple (this is an average
> observed on a ~30 million triple memory store, on a 64-bit machine, so
> it will probably be a bit less on a regular 32-bit architecture).
>
> Note that this includes inferred triples: a 1000 triple document may
> result in 2000 actual triples in the store (the ratio depends on your
> ontology of course, we usually find that for simple schemas the number
> of inferred triples is 30-60% of the original number of triples).
>
> > - data is saved on disk *only* after regular shutdown. In case of
> > system collapse there is data loss. (Jeen, is there a workaround
> > for this problem? like saving the new RDF right away before
> > returning)
>
> This is actually no longer true. In the newer versions of Sesame, data
> is saved to disk immediately after each commit by default, and the
> behavior is configurable. Quoting from the configuration manual
> (http://www.openrdf.org/doc/sesame/users/ch04.html#d0e651):
>
> The 'syncDelay' parameter specifies the time (in milliseconds) to
> wait after a transaction was commited before writing the
> changed data
> to file. Setting this variable to '0' (the default value) will force
> a file sync immediately after each commit. A negative value will
> deactivate file synchronization until the Sail is shut down. A
> positive value will postpone the synchronization for at least that
> amount of milliseconds. If in the meantime a new transaction is
> started, the file synchronization will be rescheduled to wait for
> another syncDelay ms. This way, bursts of transaction events can be
> combined in one file sync, improving performance.
>
> You can also explicitly force a disk sync if you want, by invoking the
> RdfRepository.sync() method.
>
> HTH.
>
> Jeen
> --
> Jeen Broekstra Aduna BV
> Knowledge Engineer Julianaplein 14b, 3817 CS Amersfoort
> http://aduna.biz The Netherlands
> tel. +31 33 46599877
>
This email has been scanned by Postini.
For more information please visit http://www.postini.com

Received on Fri Jul 22 2005 - 13:02:25 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT