Re: [update] piggybank performance profiling from David R. Karger on 2005-09-08 (stdin)

From: David R. Karger <karger_at_mit.edu>
Date: Thu, 8 Sep 2005 09:10:07 -0400

stefano, if you don't want to wait for a fix to the db, you might
solve the problem by storing each predicate in both directions and
having your interface swap the query to the fast direction if it is
backwards.

   Mailing-List: contact general-help_at_simile.mit.edu; run by ezmlm
   X-No-Archive: yes
   Reply-To: <general_at_simile.mit.edu>
   X-AntiVirus-Version: ClamAV 0.86.2/1057
   X-AntiSpam-Version: SpamAssassin 3.0.4
   X-AntiSpam-Status: No (score=1.8/limit=7.5)
   X-AntiSpam-Rules: rcvd_in_sorbs_dul, listed, rcvd_in_njabl_dul, listed
   Date: Fri, 02 Sep 2005 19:30:21 -0400
   From: Stefano Mazzocchi <stefanom_at_mit.edu>
   X-LocalTest: Local Origin
   X-Status:
   X-Keywords: NonJunk
   X-UID: 780

   My quest for the solidification of piggybank continues. Still feel like
   I'm in a jungle, but now at least I have a compass: both the frontend
   (the part inside firefox) and the backend (the java part) are
   instrumented with tracing/profiling loggers and this helps a lot in
   finding out what is consuming our cycles and what is going on.

   First of all, I realized that velocity templates cache was turned off.
   This means that everytime we loaded a template (and we use several!) no
   matter how frequently used, we would have to parse it again. Since we
   have templates that generate as little as a few lines and are reused
   hundreds of times thruout the various pages, you understand this was a
   lot of wasted CPU for no reason.

   I turned the cache on, and this makes the first page turn up almost
   instantaneously (some 200ms total and 100ms reaction time) after it has
   been loaded once (the first time takes a while but it's understandable
   since it has to load itself from the triple store... note that the first
   load of that page could be made transparent in the background so that we
   don't show this delay at the user).

   The browsing of the piggybank items is still incredibly slow, though...
   I was expecting a substantial performance improvement but instead it
   seems there is something a lot bigger dragging us.

   So, I kept instrumenting and found out that we spend pretty much half of
   our time (if not more) by performing queries such as

     ?x predicate object

   each of those queries take between 1300 and 1600 on my machine and with
   the (small) number of statements I have in my triple store, compared to
   basically 0ms time of queries such as

     subject predicate ?x

   which seems to indicate that Sesame does a good job at indexing by
   subject but a terrible job at indexing by object.

   Also, unlike the first page that seems to be caching results
   effectively, the internal pages generate a little avalance of queries to
   the triple store... a good number of which are 'give all subjects of
   given object' queries and therefore result in the perceived slowness.

   Now, the question is: is there a way to make sesame index by object too?

   --
   Stefano Mazzocchi
   Research Scientist Digital Libraries Research Group
   Massachusetts Institute of Technology location: E25-131C
   77 Massachusetts Ave telephone: +1 (617) 253-1096
   Cambridge, MA 02139-4307 email: stefanom at mit . edu
   -------------------------------------------------------------------
Received on Thu Sep 08 2005 - 13:05:50 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT