Re: Piggy-bank continues to haunt my dreams . . . from Rickard Öberg on 2005-11-27 (stdin)

From: Rickard Öberg <rickard.oberg_at_senselogic.se>
Date: Sun, 27 Nov 2005 17:24:21 +0100

Dean Allemang wrote:
> I am interested in your comments about performance; once of the things that
> interests me about piggy bank is how the user has control over how much
> stuff is in the current search space; e.g., you scrape a few pages with
> Solvent, merge them, and run a query over the merged stuff (or show it off
> in your googlemap). You get some WOW! results with very modest numbers of
> triples (e.g., showing all the starbucks and libraries in my zip code
> probably takes something less than 100 triples). Even if I were to add in
> a few dozen "inferencing" triples (subClassOf, subPropertyOf, inverse),
> even the most brain-dead inferencer could make it through that in a few
> seconds. In the SemanticBank scenario, you get lots more triples (but even
> so, I bet that if I were to dump the entire ISWC semanticBank as triples,
> and put in a bunch of inferencing triples over it, that a smart engine like
> Jess or RDFGateway could cut through the whole thing in about 2 cpu
> seconds). [aside - is there an easy way do a dump like this, in RDF/XML or
> N3?]

For me that's not enough. I deal with web CMS's and portals, and if a
page has 4 portlets which take 2 secs each, then that's gonna be slow as
hell. No wow effect in the world is going to save it from being regarded
as utterly useless.

But in general, do people feel that the best way to do these things is
to put all data in one place and make gigantic indexes, or is it better
to have semantic web "islands" which a federated query system can work
with? If it's all in one place, how do you deal with clustering of that?
Simply replicate all data to all nodes all the time? And updates of that
data?

In some sense the whole notion about the "new web" is to a large extent
about distribution, so making it monolithic by putting all data into one
place seems wrong.

Intuitively it seems like data needs to be owned and updated by semantic
web "nodes", but there should be some way to do data *caching* further
out in the network, closer to clients doing actual queries, in order to
make it fast and to avoid problems with network topology, e.g. the
system providing the data may be on an internal net and the only way to
allow it to be queried it is to export data to a web node in the DMZ.

> Finally, as for MS, they are quite conspicuous by their silence. We make
> it our business to know what companies like MS are doing here, and we have
> drawn on all our contacts to find out. Even so, the result is a deafening
> silence. This suggests to me that they have something really, really, big,
> but not nearly as cool as piggybank, ready to unveil and take over the
> world. Q106.

We'll all have to stay tuned then... thanks for the info.

/Rickard
Received on Sun Nov 27 2005 - 16:18:01 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT