Re: [update] piggybank performance profiling

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Sat, 03 Sep 2005 13:38:38 -0400

Vineet Sinha wrote:
>
>> So, I kept instrumenting and found out that we spend pretty much half
>> of our time (if not more) by performing queries such as
>>
>> ?x predicate object
>>
>> each of those queries take between 1300 and 1600 on my machine and
>> with the (small) number of statements I have in my triple store,
>> compared to basically 0ms time of queries such as
>>
>> subject predicate ?x
>>
>> which seems to indicate that Sesame does a good job at indexing by
>> subject but a terrible job at indexing by object.
>
>
> I did a performance tuneup of Relo in the last week and 'fixed' the
> above as well. Some more minor details (in hindsight they are obvious)
> that might be helpful - the overhead when using the native store was
> rougly 200x higher for both getStatements and hasStatements when the
> subject is not provided.

Well, it's worse than this, I suspect that while the "subj pred ?x"
queries are hashed, the "?x pred obj" queries are iterated, meaning that
it's not a fixed cost that we are paying but a cost that is proportional
to the amount of data in the triple store... and this is *really* bad news!

Jeen, are we doing something wrong? or is this really Sesame's limitation?

> I was able to get my performance up by:
> ]] modifying my schema so that the user facing actions motly result in
> queries that have subject provided (mostly by adding a reverseCached
> predicate).
> ]] limiting the number of and moving most of the reverse queries to
> return results to the interface asynchronously
>
> What did not work, was loading part of data to an in-memory store. I am
> guessing this is because the in-memory store is also indexed by subject
> - though I did not spend as much time on this approach since in my case
> the above two cases showed very good results without more tweaking needed.

Thanks dude, you saved me a few hours of trying the memory store instead.

I'll see if I can do the same for PB... I have to understand what
queries do what.

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Sat Sep 03 2005 - 17:34:31 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT