Re: [RT] rethinking the need for inferencing

From: Richard Cyganiak <richard_at_cyganiak.de>
Date: Tue, 7 Mar 2006 11:58:13 +0100

> What I personally don't like about rule engines is that they tend
> to start simple and focused and later they grow into monstrous
> turing-complete languages or they do a magnificent job at that
> 80/20 rule, too bad that what you need is *always* 81% and adding
> that 1% to the 80% is enough work to make you hate it. (XSLT feels
> like that to me almost all the times).
>
> So, before we go down the path of defining which architecure we
> want for this RDF trasnformation pipeline, I think it's important
> to define what operations we want to be able to achieve.

I tried something similar to a pipeline architecture in a recent
(tiny) project. I called the pipeline stages "mini inferencers". Not
very memorable! Here are some of the mini inferencers I implemented
or considered implementing. Most of them should be useful as RDF
pipeline stages.

Node replacement: Replace URI a with URI b in all statements. (Or
rather, have a static mapping from input nodes to replacement nodes.)
That's enough to emulate subClassOf, subPropertyOf, and class/
property sameAs if you know the involved vocabularies at
configuration time.

URI generation: Replace blank nodes with a URI generated by some
function (e.g. prefix the blank node ID or an urlencoded property
with a URI prefix)

Label generation: If resource a doesn't have an rdfs:label, generate
one by picking from a list of its property values, or default to the
local part of the URI. A poor man's Fresnel, kind of.

Range/domain inferencing: Generate rdf:type statements for all
subjects of property foo or all objects of property bar. Again, the
properties and classes can be hardcoded at configuration time.

Projection: Drop all statements (or passthrough only statements) that
match some criteria, e.g. the predicate is from the foo: namespace.
Useful for removing internal administrative data before publishing.

Property attacher: For all resources r of type t, run some custom
Java code and attach the result to r with property p. Seems to be a
common pattern. Can generate things like labels, foaf:mbox_sha1sums,
rdfs:seeAlsos.

Property value picker: For resources that have multiple values for
property p, pick one value and drop the others, according to some
function. Example function: "If most values are textually similar,
drop the outliers. Then prefer longer values that contain both upper
and lower case letters." This would keep "The Beatles" over "beatles"
and "Four English Whackos" in a music dataset. Not sure if this would
work in the real world :)

Richard


>
> smooshing
> =========
>
> smooshing is a graph operation that takes, for example, this graph
> as input
>
> A -(a)-> [foo]
> B -(b)-> [bar]
> B -(sameAs)-> A
>
> and returns you
>
> A -(a)-> [foo]
> A -(b)-> [bar]
> B -(sameAs)-> A
> B -(b)-> [bar]**
>
> where the ** statement can be present or not depending on whether
> not we want the smoosher to be preserving (no statements are
> removed) or pruning (only one node ends up having the properties
> attached to it)
>
> The smoosher can also act on property equivalences:
>
> A -(a)-> [foo]
> B -(b)-> [bar]
> b -(sameAs)-> a
>
> returns
>
> A -(a)-> [foo]
> B -(a)-> [bar]
> b -(sameAs)-> a
> B -(b)-> [bar]**
>
>
> subclassing
> ===========
>
> subclassing is a graph operation that takes, for example, this
> graph as input
>
> A -(a)-> [foo]
> B -(b)-> [bar]
> B -(subClassOf)-> A
>
> and returns
>
> A -(a)-> [foo]
> B -(b)-> [bar]
> B -(a)-> [foo]
> B -(subClassOf)-> A
>
> this also works for properties (even if the result is different):
>
> A -(a)-> [foo]
> B -(b)-> [bar]
> b -(subClassOf)-> a
>
> returns
>
> A -(a)-> [foo]
> B -(b)-> [bar]
> B -(a)-> [bar]
> b -(subClassOf)-> a
>
> NOTE: the real operations are more complicated than this if the
> equivalences involve classes and not just instances.
>
>
> extracting
> ==========
>
>
> given a graph, a property, a regular expression and a graph
> pattern, returns a graph that has the literal associated with the
> property "extracted" into subgraphs that are appended to the
> existing graph.
>
> This is useful to turn something like
>
> A -(name)-> "Smith, Joe 1932-1978"
>
> into
>
> A -(lastname)-> "Smith"
> A -(firstname)-> "Joe"
> A -(dates)-> "1932-1978"
>
>
> distancing
> ==========
>
> given a graph, a property, a string distance function, a threshold
> and a property, generate a graph that contains equivalences between
> nodes if they contain a property that is closer than the threshold.
> For example
>
> A -(name)-> "Stefano Mazzocchi"
> B -(name)-> "Stefano Mazzochi"
>
> f is 'edit distance'
> t is 2
> p is 'sameAs'
>
> returns
>
> A -(name)-> "Stefano Mazzocchi"
> B -(name)-> "Stefano Mazzochi"
> A -(sameAs)-> B
>
> Another version of a distancing operator is to use a node distance
> function and a node type instead of a string distance function and
> a property.
>
>
> projecting
> ==========
>
> projecting is a graph operation that takes a graph, a node type and
> a property as input and returns a graph as a result. For example,
> projecting this graph
>
> A1 -(type)-> Z
> A2 -(type)-> Z
> A1 -(a)-> B
> A2 -(a)-> B
>
> along on type Z with property -(1)-> returns
>
> A1 -(type)-> Z
> A2 -(type)-> Z
> A1 -(1)-> A2
> A2 -(1)-> A1
>
> The utility of this operation is to perform the same type of graph
> operation that Amazon does with books "the people that bought this
> CD also bought this other".
>
> NOTE: there are many variations of this projection operator and I'm
> currently not sure I understand which ones makes most sense.
>
>
> Boy, this was longer than I expected. Well, hope it makes some sense.
>
> Please, pile up your operators.
>
> --
> Stefano Mazzocchi
> Research Scientist Digital Libraries Research Group
> Massachusetts Institute of Technology location: E25-131C
> 77 Massachusetts Ave telephone: +1 (617) 253-1096
> Cambridge, MA 02139-4307 email: stefanom at mit . edu
> -------------------------------------------------------------------
>
>
Received on Tue Mar 07 2006 - 10:56:34 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT