Re: Sesame vs. Jena? from Stefano Mazzocchi on 2005-09-23 (stdin)

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Fri, 23 Sep 2005 16:21:56 -0700

Richard Cyganiak wrote:
> Dear list,
>
> I'm under the impression that Simile projects use or have used both the
> Sesame and Jena frameworks. I'm in a discussion about the relative
> benefits of both frameworks. I'm curious about your experiences. In the
> context of Longwell and Piggybank and Semantic Bank, what are the
> strengths and weaknesses of both APIs? Are your experiences documented
> anywhere?

The topic is complex and the feelings to hurt are many :-) so my
diplomatic side would say "they both do their job well".

In real life, we went the Sesame way, despite a much poorer licensing
model, mainly because of code size: Jena was simply too big to ship with
PiggyBank. (you understand that shipping a browser extension that is
bigger than the browser itself feels kinda weird).

Our previous tests indicated that Sesame was slightly faster than Jena
for our needs, but I'm willing to state that our report[1] doesn't have
that much statistical solidity to this claim, it was done to have an
rough idea of whether or not the various triple stores had severe (in
the order of magnitude range) differences.

It turns out, they don't.

Some claim massive performance, but they either achieve that with a lot
of memory or a lot of indexing or both. At the end of the day, the
algorithms they use to do things are equivalent.

In any case, in our future roadmap we do have one deliverable that is a
triple-store shootout report and our goal is to come up with a testing
suite (sort of specRDF) that allows easily reproducible tests, hopefully
across even non-java solutions.

Our goal is to load the test with a real-life-and-very-big dataset (I
think the RDFized artstor + OCW corpus would be great for that, with
something like 150Mtriples) and then run a set of queries on top of it
(the ones that we normally run in our faceted browsing applications) and
then see how much their performance degrades over load (means,
concurrent queries)... obviously, each 'product' will be considered in
different configurations, depending on whether or not they use a memory
store, a native store or an RDBMS store, an inferencing store and what not.

Such a shootout is a lot of work, but we have the machine (beefy but
cheap, aligned with what normal people would buy/need) and the data, we
just need the time and the people to realize it.

Goes without saying, help would be most welcome, we would be glad to
give any volunteer access to our testing machine and even commit access
to the repository would the need emerge.

[1] http://simile.mit.edu/reports/stores/

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------

Received on Fri Sep 23 2005 - 23:17:17 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT