[RT] Moving Piggy Bank forward... from Stefano Mazzocchi on 2005-07-21 (stdin)

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Thu, 21 Jul 2005 15:44:58 -0400

<background>
RT stands for "random thoughts". It's a long-time tradition that I
established in the Apache Cocoon community years ago and proved itself
to be a wonderful way to get input from people on how to do things.

It's a sort of 'brainstorming' mode where anybody is allowed to say
anything they want, including silly or funny things or blue-sky
suggestions, most of the time it ends up in nothing, a few times it
changes the shape of the project entirely. (yes, it happened at least 3
times for cocoon and several times in other projects).
</background>

Like mentioned before, Piggy Bank is a bitch to profile and while it's
the most clever hack of the decade (a cross-platform firefox plugin
written in java and with one click install!) [and I *can* say this
because it was David's idea, not mine!] still remains a hack, something
that not many are using.

Piggy Bank on a mac is painfully slow. My profiling indicates that
something between 20% and 30% of the time is spent on I/O only. I don't
know where the problem is. It could be the OS, the JVM, the calls
between the OS and the JVM, the calls between the Embedding Plugin and
the JVM (I suspect JNI might be part of the problem).

Another painful thing is the use of "complete RESTfull" approach for
URLs. While this is a very nice feature on paper, in real life 20% of
the time is spent on URL-encoding of the queries... and the rest is
spent even making I/O worse because lots of bytes are transferred thru
the localhost socket.

The third thing that is killing me is the lack of inferencing and I also
suspect some kind of sever bottlenecks in the sesame native store
implementation (again, has to do with I/O).

The forth thing that is really needed is a way to tune the view based on
data type. David and I spent a lot of time designing the 'one size fits
all' data view, but as others pointed out, this is just another way to
do what the original web site already does.... and most of the time, poorer.

- o -

So, here are possile solutions for the above issues:

1) decoupling PB from the browser

PROs:

- it would drastically reduce the memory consumption of the browser,
which is especially harmful since Firefox 1.0.x contains nasty memory
leaks associated with the tab browser.

- avoid the startup time when you fire up the browser (the startup time
is required, but if you pay it at OS login you don't notice it that much)

- the two programs are isolated, one can crash without influencing the
other

- we can think of other plugins that use "piggy-bank" as a local
service (for example thunderbird plugins that use PB to know which RSS
feed to subscribe, or that inject email in piggy-bank for facetted
browsing your email, or your pictures/calendar you name it)

- avoid the need for installing the java embedding plugin on a mac

- *might* help performance (not sure about that, needs more testing)

- makes it a lot easier to write a cross-browser plugin

- could also work as transparent RDF-izing/caching proxy

CONs:

- installation becomes more painful (and more work for us since we have
to write different installers for different operating systems) some
people don't like to have services running (even if things like Google
Desktop work exactly like this).

2) find a compression scheme for the URLs.

PROs:

- makes the page a lot smaller (reduces I/O overhead)

- reduces the url-encoding overhead

CONs:

- the compressions scheme must be shared across instances or wouldn't
work for horizontal clustering (same problem with sessions, but there
are workarounds... like affiliations and all those tricks)

3) use the Sesame Memory store instead of the native one (which supports
RDFSchema entailment and hopefully OWL-tiny with some custom rules)

[Jeen, can you tell us more about that?]

PROs:

- we get basic inferencing on equivalences and subclassing

- it's considerably faster

CONs:

- memory consumption grows linearly with the amount of data stored (or
worse?)

- data is saved on disk *only* after regular shutdown. In case of
system collapse there is data loss. (Jeen, is there a workaround for
this problem? like saving the new RDF right away before returning)

4) integrating ryan's work on fresnel

PROs:

- data-driven views (way more useful)

CONs:

- reduced speed? (needs testing but complexity increases)

- hard to configure? (we don't know how difficult fresnel stylesheet
are to use)

Comments?

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------

Received on Thu Jul 21 2005 - 19:41:57 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT