Re: Piggy Bank and rules

From: David R. Karger <>
Date: Mon, 11 Apr 2005 16:38:20 -0400

Phil's system is a kind of "inference engine" that materializes on the
fly certain assertions about the content rating of a given resource
(web page). It does so by looking at the URL, which is arguably a bad
idea (violates url opacity) but is understandable in the real world.
And can be hidden from tools like piggy bank behind the inference
engine interface---ie, pb issues a query to a certain system regarding
a given url, and gets back some statements, without knowing how they
were inferred.

As far as I know this doesn't require putting anything "in"
piggybank. It just requires pointing pb to the icra as one of the
"semantic banks" from which it collects rdf...

   Mailing-List: contact; run by ezmlm
   X-No-Archive: yes
   Reply-To: <>
   X-AntiVirus-Version: ClamAV 0.83/815
   X-AntiSpam-Version: SpamAssassin 3.0.2
   X-AntiSpam-Status: No (score=1.7/limit=7.5)
   X-AntiSpam-Rules: rcvd_in_njabl_dul, listed
   Date: Fri, 08 Apr 2005 12:51:43 -0400
   From: Stefano Mazzocchi <>
   X-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham
   X-LocalTest: Local Origin

   Phil Archer wrote:
> Hi all,
> I sent Ryan an e-mail the other day and he suggested I shared this with
> the full list so, after a bit of a delay, here goes.
> I wanted to let you know first of all how incredibly useful Piggy Bank
> is being for me in talking to about the virtues of the Semantic Web. My
> organisation, ICRA [1], currently uses the old PICS standard to add
> labels to content that describe whether it contains sex, nudity,
> violence etc. Filters, ideally ones built onto browsers, can then allow
> or block access to content based on those labels. The best known example
> of this is Content Advisor in Internet Explorer that comes with an old
> rating system called RSACi. That organisation/rating system lead to
> ICRA. We're now working to move labelling from PICS to RDF.
> OK, introductory history lesson over.
> Our use case involves content providers linking a small number of
> descriptions, what we call content labels, to any number of resources.
> For example, "there is no sex or nudity on". That's more than
> one URI we're trying to describe and to make RDF work, we need a way to
> encode that. Further, we need to be able to say "everything at
> has description A while everything else
> on the domain has description B."
> This has lead to the development of a simple rule set that is predicated
> on matching the URL of a resource for which we want a description
> against a sequence of one or more Perl5 regular expressions. The first
> match then leads to a description - what we call a content label.
> Use cases and test data at [2], schema description at [3].
> And so to my question - do you see any wider value in Piggy Bank (or
> other SW helper applications) working with the kind of rule set ideas
> we're now using in our own use case? Let me expand a little further.
> The content label testing tool I've hacked together on our site [4]
> visits a target URL and looks for RDF data, then narrows in to look
> specifically for ICRA labels (my plan is to expand this in the near
> future but I'm in concept-proving mode still). There's a small chunk of
> rdf on my personal site at There
> are links to this same file in both the homepage and a dummy page set up at
> Links [5] and [6] below take the label
> tester off to those 2 pages respectively, it grabs the RDF instance and
> then works out which ICRA label applies to the URL in question -
> needless to say you get a different result for each URL.
> This is due to the simple rules encoded in the RDF instance - any URL on
> a given list of hosts gets "label 1", but if the URL contains "589" it
> gets label 2. Piggy Bank knows nothing about these rules of course so it
> shows all the RDF classes (in my terms, both possible labels) and the
> rule set itself.
> If Piggy Bank were to gain a deep an meaningful understanding of the
> rule set [3] (i.e. had some code added to support the functionality!) it
> would demonstrate the enormous potential of all this to the internet
> safety community. Yes, the labels might be used for filtering but they
> can equally be used to show through the kind of visualisation
> exemplified by Piggy Bank that a site is a good resource for homework,
> contains medical information that can be trusted and so on.
> Enough for one e-mail. I'm naturally keen to know what you think.


   thanks much for you interest in Piggy-Bank.

   As you might know, the SIMILE project is ran with open development
   practices: write a patch (or hire somebody to do it) and we stick it in!

   See, I don't have kids, so I don't naturally feel the need to filter
   content, nor to work on technology that does it for me. It's not
   something I would work on just for fun (unlike some other aspects of

   But I do understand that some people find this valuable and I would not
   be against having such functionality in Piggy-Bank.

   So, submit a patch, we merge it and voila', the functionality you want
   is in Piggy-Bank next release. Way easier than waiting for microsoft, or
   mozilla to implement it ;-)

   Stefano Mazzocchi
   Research Scientist Digital Libraries Research Group
   Massachusetts Institute of Technology location: E25-131C
   77 Massachusetts Ave telephone: +1 (617) 253-1096
   Cambridge, MA 02139-4307 email: stefanom at mit . edu
Received on Mon Apr 11 2005 - 20:37:50 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT