Re: Piggy Bank and Semantic Bank: scalability and performance

From: David Huynh <dfhuynh_at_csail.mit.edu>
Date: Sun, 23 Oct 2005 16:26:59 -0400

Rickard Öberg wrote:

> David Huynh wrote:
>
>> ~30.000 items... You're brave :-)
>
>
> Well, that's just it... if you think about it 30.000 is NOTHING. Tiny.
> Miniscule. Insignificant. I mean, if you are really really serious
> about building semantic banks and semantic webs then you should add a
> couple of zeroes to that number...

Oh no, we do have datasets in the order of hundreds of thousands of items...

> [snip]
>
>> I think heavy-duty scraping should not be supported inside Piggy
>> Bank. We can have an entirely stand-alone application for that. You
>> probably want sophisticated status monitoring UI, etc., etc., too.
>
> Indeed. Any ideas for how to do that? For heavy-duty scraping I will
> probably want to do it on a timer as well, and revisit websites
> reasonably often to get new data.

Man, I'm never gonna finish my thesis...

> [snip]
>
>> I'd say export the data out to N3 or RDF/XML. Then run Longwell2 on
>> it using an MySQL database.
>>
>> We do intend to handle large datasets but we haven't gotten around to
>> that yet.
>
> Well, if you want to I can really recommend the Earthquake data :-) I
> can send you the scraper if you're interested. It's good as a reality
> check if nothing else ;-)

It'd be great if you can send the URL to the N3 file describing that
scraper... Then it's easy to install it into Piggy Bank.

>> There are two sets of tools that we are interested in providing:
>> - tools that add value to existing Web information for naive users in
>> their everyday use of the Web (--> Piggy Bank)
>> - tools that let domain experts make sense of their information
>> (-->Welkin,...)
>> The first category should not require much configuration while the
>> latter should take advantage of domain knowledge for
>> optimization--speed, memory, and UI.
>
>
> Well, if Piggy Bank could be expanded using RDF that describes how
> properties should be visualized it'll be just another kind of data to
> get in there. I want to allow non-technical people to work with huge
> datasets using nice visualizations (yes, it's a challenge, I know
> that) so being able to extend the PB interface is worth a lot as it is
> webbased. If I can embed applets as interface which can do some of the
> more complicated things (like timelines) then that is fine.

Yes, we have a lot of technologies at hands (applet, canvas, svg, ajax)
to build more sophisticated visualizations. It's a bright future.

>> Glad you're pushing it to the limit :-) Just curious, have you tried
>> plotting 30,000 items on Google Maps?!
>
> Yes. Doesn't work :-)

You want to hook in Virtual Earth or Yahoo Maps and see? :-)

David
Received on Sun Oct 23 2005 - 20:21:29 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT