Re: [RT] rethinking the need for inferencing

From: Rickard Öberg <rickard.oberg_at_senselogic.se>
Date: Sat, 11 Mar 2006 08:32:38 +0100

Stefano Mazzocchi wrote:
> The only things we needed off of RDF/S and OWL are what has been named
> "OWL tiny": basically sub-typing and equivalences. And no, sets that are
> members of themselves are not the kind of problem I'm likely to encounter.

When I look at our own models, I can see two things that we use over and
over again: inverse and transitive properties. Our object model
(CMS-type content model) is inherently hierarchical, and it is veeery
common to say something for a node and have that be inherited down the
tree. If I can describe things like "isDescendentOf" to be transitive it
will save me lots of headaches, as I would otherwise have to insert
explicitly those relationships. On the other hand, doing such transitive
inferences may literally explode the database in size. If I for example
say something about the root node and that is inherited by all nodes
then that is going to be a LOT of data to be added. And consequently, if
I change or remove that data there's a LOT of inferenced data to be
updated. It might be the case that it is better to inference such
relationships at runtime, to avoid the data truth management issues.

> here is where it gets tricky: if vra:creator is equivalent to
> dc:creator, do you remove the original property and attach a new one, or
> add a new one and leave the existing one there? There is no way to tell
> an OWL inferencer what to do with that.
>
> Even more, if urn:a is equivalent to urn:b, do we copy all properties of
> urn:a to urn:b and viceversa, or do we take urn:a as the master, move
> all the properties of urn:b to urn:a and leave urn:b attached to urn:a
> with the equivalence? Again, two strategies and now way to differentiate
> them.

Sounds like a serious case of schizophrenia... yes, the problem of
identity is tricky :-)

> Fresnel is, in fact, the first of our attempts into #3. You can, in
> fact, see Fresnel as a "spanning tree" extractor (which is the simplest
> thing that could allow us to hook an RDF pipeline into an XML pipeline,
> therefore maximizing reusability of existing software). In theory, if
> you have an XML pipeline already available, a 'spanning tree' extractor
> is all you need to perform all sort of serialization operations.

I'm involved in a project with the IRS using RDF where part #3 will be
to construct OpenOffice documents (XML) from an RDF database, and then
generate PDF's from that. Will be interesting to see how that works out.

> extracting
> ==========
>
> given a graph, a property, a regular expression and a graph pattern,
> returns a graph that has the literal associated with the property
> "extracted" into subgraphs that are appended to the existing graph.
>
> This is useful to turn something like
>
> A -(name)-> "Smith, Joe 1932-1978"
>
> into
>
> A -(lastname)-> "Smith"
> A -(firstname)-> "Joe"
> A -(dates)-> "1932-1978"

In particular, dates seem to often need reformatting. If some raw
content has been extracted from a source, and it used a human readable
format, then reshaping it into some standard format would be necessary.
Not sure if that is doable by only applying regexps though, but for many
cases it will probably work decently.

> Boy, this was longer than I expected. Well, hope it makes some sense.
>
> Please, pile up your operators.

It does make sense, although I'm too new to this field to have a strong
opinion either way.

What is important in general, for me, is to achieve scalability. If some
approach is nice and pretty, but doesn't scale, it doesn't work. If what
you describe above is simpler and less expressive, but scales, then that
works. Pipelines are nice too, for the reasons you mention, and we are
rather used to the concept anyway since it is so common in other fields.
If what you describe does not include handling things like transitivity,
I would like to know how you would suggest such cases are handled
instead, if at all.

/Rickard
Received on Sat Mar 11 2006 - 07:32:44 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT