Re: [RT] rethinking the need for inferencing from Stefano Mazzocchi on 2006-03-07 (stdin)

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Tue, 07 Mar 2006 14:00:46 -0800

Richard Cyganiak wrote:
>> What I personally don't like about rule engines is that they tend to
>> start simple and focused and later they grow into monstrous
>> turing-complete languages or they do a magnificent job at that 80/20
>> rule, too bad that what you need is *always* 81% and adding that 1% to
>> the 80% is enough work to make you hate it. (XSLT feels like that to
>> me almost all the times).
>>
>> So, before we go down the path of defining which architecure we want
>> for this RDF trasnformation pipeline, I think it's important to define
>> what operations we want to be able to achieve.
>
> I tried something similar to a pipeline architecture in a recent (tiny)
> project. I called the pipeline stages "mini inferencers". Not very
> memorable! Here are some of the mini inferencers I implemented or
> considered implementing. Most of them should be useful as RDF pipeline
> stages.

Awesome!

> Node replacement: Replace URI a with URI b in all statements. (Or
> rather, have a static mapping from input nodes to replacement nodes.)
> That's enough to emulate subClassOf, subPropertyOf, and class/property
> sameAs if you know the involved vocabularies at configuration time.

Yep, thought about this one, but it has a rather big drawback that you
lose one of the sets of identifiers and for us is too big of a cost. But
I can see how it could be useful for more inward-looking datasets.

> URI generation: Replace blank nodes with a URI generated by some
> function (e.g. prefix the blank node ID or an urlencoded property with a
> URI prefix)

Good one! I normally do that at RDF generation time, but you are totally
right, this could be done later on.

> Label generation: If resource a doesn't have an rdfs:label, generate one
> by picking from a list of its property values, or default to the local
> part of the URI. A poor man's Fresnel, kind of.

I see.

> Range/domain inferencing: Generate rdf:type statements for all subjects
> of property foo or all objects of property bar. Again, the properties
> and classes can be hardcoded at configuration time.

this is part of forward-chaining RDF/S inferencing, but yes, the rules
could be split into different components.

> Projection: Drop all statements (or passthrough only statements) that
> match some criteria, e.g. the predicate is from the foo: namespace.
> Useful for removing internal administrative data before publishing.

More than projection I would call this filtering.

> Property attacher: For all resources r of type t, run some custom Java
> code and attach the result to r with property p. Seems to be a common
> pattern. Can generate things like labels, foaf:mbox_sha1sums,
> rdfs:seeAlsos.

Right, you can also expect to call some web service or external database
to augment/normalize the data.

> Property value picker: For resources that have multiple values for
> property p, pick one value and drop the others, according to some
> function. Example function: "If most values are textually similar, drop
> the outliers. Then prefer longer values that contain both upper and
> lower case letters." This would keep "The Beatles" over "beatles" and
> "Four English Whackos" in a music dataset. Not sure if this would work
> in the real world :)

:-)

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------

Received on Tue Mar 07 2006 - 21:59:06 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT