Re: Piggy Bank blocks find as you type (+ comments) from David Huynh on 2005-07-21 (stdin)

From: David Huynh <dfhuynh_at_csail.mit.edu>
Date: Thu, 21 Jul 2005 21:01:05 -0400

Michael McDougall wrote:

>
>
> David Huynh wrote:
>
>>
>> - Saving data automatically into "My Piggy Bank" can be
>> undesirable--you might want to select which items to save; and you
>> might also want to tag them as you save them. And we don't want rogue
>> sites to pollute "My Piggy Bank" just because the user accidentally
>> visits them.
>
>
>
> You may have been through all these ideas before, but I've been
> thinking about this a bit. You could try an "opt-out" approach where
> all sites are initially trusted but the user can mark a site as 'bad',
> which will remove that site's data from Piggy Bank (and subsequent
> visits to the site won't scrape new data). Or you could have 2 data
> banks: one with data from all sites (perhaps with the opt-out option)
> and one with data from sites that were explicitly marked as trusted.
> That way I can browse my trusted data, and when something's not there
> I can (hold my nose and) browse a giant pile of data from all the
> sites I visit.

Actually, for each site that you collect from (by clicking the data coin
icon), we create one data bank. Now, the data in that data bank is
ephemeral--we cache it just for a while. However, we keep all the
metadata about the original page so that we can reconstruct the data
bank at a later date. So, even before you save any item from that data
bank into the permanent "My Piggy Bank", you can already bookmark any
page as you are browsing through the collected (but not yet saved) data.

> It might also be worth treating data from scrapers differently than
> data from plain RDF. If I install the ACM Portal scraper I think it's
> safe to assume that I trust everything on the ACM Portal site.
>
> Have you actually been encountering many 'rogue sites' that pollute
> Piggy Bank? I'm doing some research on semantic web security so I'd
> like to learn more about real world issues like this.

Not yet... If we start to encounter rogue sites, that's when we know
we're getting successful :-) Any suggestion on a secure infrastructure
for distributing scrapers?

David
Received on Fri Jul 22 2005 - 00:57:57 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT