Re: Piggy Bank blocks find as you type (+ comments)

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Sat, 23 Jul 2005 21:26:07 -0400

Michael McDougall wrote:
>
>
> David Huynh wrote:
>
>>
>> - Saving data automatically into "My Piggy Bank" can be
>> undesirable--you might want to select which items to save; and you
>> might also want to tag them as you save them. And we don't want rogue
>> sites to pollute "My Piggy Bank" just because the user accidentally
>> visits them.
>
>
>
> You may have been through all these ideas before, but I've been thinking
> about this a bit. You could try an "opt-out" approach where all sites
> are initially trusted but the user can mark a site as 'bad', which will
> remove that site's data from Piggy Bank (and subsequent visits to the
> site won't scrape new data). Or you could have 2 data banks: one with
> data from all sites (perhaps with the opt-out option) and one with data
> from sites that were explicitly marked as trusted. That way I can browse
> my trusted data, and when something's not there I can (hold my nose and)
> browse a giant pile of data from all the sites I visit.

Collecting rdf automatically (no metter where they end up) is still a
very intensive process, are you sure you want that to happen for *every*
page that contains RDF? pretty soon a lot of pages will, in one way or
another. The data might just be overloading both you and your browsing
experience.

> It might also be worth treating data from scrapers differently than data
> from plain RDF. If I install the ACM Portal scraper I think it's safe to
> assume that I trust everything on the ACM Portal site.

This is a good point, thru scrapers we have some filtering power.

> Have you actually been encountering many 'rogue sites' that pollute
> Piggy Bank?

No, not yet.

> I'm doing some research on semantic web security so I'd like
> to learn more about real world issues like this.

I don't think PB is popular enough for this to happen, but Greasemonkey
shows that popularity has the side effect of being the target for
malicious usages.

I fear that having data autosaved is just going to tickle spammer's
greed for their data percolation.

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Sun Jul 24 2005 - 01:23:00 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT