Re: Piggy Bank and rules

From: Phil Archer <>
Date: Tue, 12 Apr 2005 08:55:25 +0100

David R Karger wrote:

> Phil's system is a kind of "inference engine" that materializes on the
> fly certain assertions about the content rating of a given resource
> (web page). It does so by looking at the URL, which is arguably a bad
> idea (violates url opacity) but is understandable in the real world.

Agreed on both points. It is the "real world" that has strongly influenced
the basic approach although I am very aware of the heresey of effectively
adding meaning to a URI.

> And can be hidden from tools like piggy bank behind the inference
> engine interface---ie, pb issues a query to a certain system regarding
> a given url, and gets back some statements, without knowing how they
> were inferred.
> As far as I know this doesn't require putting anything "in"
> piggybank. It just requires pointing pb to the icra as one of the
> "semantic banks" from which it collects rdf...

Not really - the RDF data is always hosted on the site itself, not on our
server. We expect to make available a "manifest" (an RDF dump) of sites on
which the accuracy of the label has been verified but it is unlikely that we
would have the resources to operate a live database that could be
interrogated on the fly by an unlimited number of clients.


> Mailing-List: contact; run by ezmlm
> X-No-Archive: yes
> Reply-To: <>
> X-AntiVirus-Version: ClamAV 0.83/815
> X-AntiSpam-Version: SpamAssassin 3.0.2
> X-AntiSpam-Status: No (score=1.7/limit=7.5)
> X-AntiSpam-Rules: rcvd_in_njabl_dul, listed
> Date: Fri, 08 Apr 2005 12:51:43 -0400
> From: Stefano Mazzocchi <>
> X-Spam-Level:
> X-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL,BAYES_00
> autolearn=ham
> version=2.63
> X-LocalTest: Local Origin
> Phil Archer wrote:
> > Hi all,
> >
> > I sent Ryan an e-mail the other day and he suggested I shared this
> with
> > the full list so, after a bit of a delay, here goes.
> >
> > I wanted to let you know first of all how incredibly useful Piggy Bank
> > is being for me in talking to about the virtues of the Semantic Web.
> My
> > organisation, ICRA [1], currently uses the old PICS standard to add
> > labels to content that describe whether it contains sex, nudity,
> > violence etc. Filters, ideally ones built onto browsers, can then
> allow
> > or block access to content based on those labels. The best known
> example
> > of this is Content Advisor in Internet Explorer that comes with an old
> > rating system called RSACi. That organisation/rating system lead to
> > ICRA. We're now working to move labelling from PICS to RDF.
> >
> > OK, introductory history lesson over.
> >
> > Our use case involves content providers linking a small number of
> > descriptions, what we call content labels, to any number of resources.
> > For example, "there is no sex or nudity on". That's more than
> > one URI we're trying to describe and to make RDF work, we need a way
> to
> > encode that. Further, we need to be able to say "everything at
> > has description A while everything else
> > on the domain has description B."
> >
> > This has lead to the development of a simple rule set that is
> predicated
> > on matching the URL of a resource for which we want a description
> > against a sequence of one or more Perl5 regular expressions. The first
> > match then leads to a description - what we call a content label.
> >
> > Use cases and test data at [2], schema description at [3].
> >
> > And so to my question - do you see any wider value in Piggy Bank (or
> > other SW helper applications) working with the kind of rule set ideas
> > we're now using in our own use case? Let me expand a little further.
> >
> > The content label testing tool I've hacked together on our site [4]
> > visits a target URL and looks for RDF data, then narrows in to look
> > specifically for ICRA labels (my plan is to expand this in the near
> > future but I'm in concept-proving mode still). There's a small chunk
> of
> > rdf on my personal site at There
> > are links to this same file in both the homepage and a dummy page set
> up at
> > Links [5] and [6] below take the label
> > tester off to those 2 pages respectively, it grabs the RDF instance
> and
> > then works out which ICRA label applies to the URL in question -
> > needless to say you get a different result for each URL.
> >
> > This is due to the simple rules encoded in the RDF instance - any URL
> on
> > a given list of hosts gets "label 1", but if the URL contains "589" it
> > gets label 2. Piggy Bank knows nothing about these rules of course so
> it
> > shows all the RDF classes (in my terms, both possible labels) and the
> > rule set itself.
> >
> > If Piggy Bank were to gain a deep an meaningful understanding of the
> > rule set [3] (i.e. had some code added to support the functionality!)
> it
> > would demonstrate the enormous potential of all this to the internet
> > safety community. Yes, the labels might be used for filtering but they
> > can equally be used to show through the kind of visualisation
> > exemplified by Piggy Bank that a site is a good resource for homework,
> > contains medical information that can be trusted and so on.
> >
> > Enough for one e-mail. I'm naturally keen to know what you think.
> Phil,
> thanks much for you interest in Piggy-Bank.
> As you might know, the SIMILE project is ran with open development
> practices: write a patch (or hire somebody to do it) and we stick it in!
> See, I don't have kids, so I don't naturally feel the need to filter
> content, nor to work on technology that does it for me. It's not
> something I would work on just for fun (unlike some other aspects of
> piggy-bank).
> But I do understand that some people find this valuable and I would not
> be against having such functionality in Piggy-Bank.
> So, submit a patch, we merge it and voila', the functionality you want
> is in Piggy-Bank next release. Way easier than waiting for microsoft, or
> mozilla to implement it ;-)
> --
> Stefano Mazzocchi
> Research Scientist Digital Libraries Research Group
> Massachusetts Institute of Technology location: E25-131C
> 77 Massachusetts Ave telephone: +1 (617) 253-1096
> Cambridge, MA 02139-4307 email: stefanom at mit . edu
> -------------------------------------------------------------------
Received on Tue Apr 12 2005 - 07:55:06 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT