Re: Stores report from Ryan Lee on 2004-07-22 (stdin)

From: Ryan Lee <ryanlee_at_w3.org>
Date: Thu, 22 Jul 2004 15:29:32 -0400

Hi Mark,

>>Fastpath optimization... I'm not sure if my results include that.
>
> In Jena, when you run over the DB and use RDQL, Fastpath is on by default,
> if you don't want it you have to turn it off explicitly - see
> http://jena.sourceforge.net/DB/fastpath.html

OK, that rings a bell; I think I just glosssed over that since it's on
by default...

>>However, I would have expected it to be substantially
>>slower because the query strategy is basically 'ask the
>>store for each facet/value pair, one at a time.'
>
> That's true in memory, but on a DB Fastpath should speed up the initial
> phase of identifying all the values that meet the constraints.

[see last paragraph]

>>Your earlier figures seem to suggest the same
>
> (http://simile.mit.edu/wiki/IssueEighteen).
>
> Not quite - if you look at my earlier figures, you'll see that RDQL to
> Postgres is 76961 ms, nearly 70% faster than Jena to Postgres (129828 ms).
> This is a considerable improvement, and I had presumed that it was down to
> Fastpath.

My apologies, I wasn't looking closely enough.

> In your results, RDQL to Postgres (111246 ms) is slightly faster than Jena
> to Postgres (115791 ms) but this is no where near as pronounced as the
> difference I got, and there is no improvement for MySQL (RDQL to MySQL,
> 97682 ms versus Jena to MySQL 81039 ms). So I just wanted to flag the
> results you are getting are different to the results I got.

The code hasn't changed since you wrote it, I believe, so either the
difference in environments (your machine is more powerful than mine) or
the data is causing the differing figures. I don't recall how large the
dataset you used for your figures was, so I'm not sure if that last
suggestion makes sense in our context. It may be that the next round of
figures I produce should include a couple repetitions in case the ones I
have now are anomalous.

Do you think we should resolve this issue before I publish the report to
a wider audience? Considering that the RDQLLocalModel timing is too
slow to use in a production environment, I'm tempted to go ahead and
publish the report, maybe with some caveat concerning differing figures,
and then sort this issue out over time, since it could still be relevant
if we choose to use Jena DB code more extensively.

-- 
Ryan Lee                 ryanlee_at_w3.org
W3C Research Engineer    +1.617.253.5327
http://simile.mit.edu/

Received on Thu Jul 22 2004 - 19:29:34 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:17 EDT