RE: examples of linking bibliographic RDF to articles

From: Matthew Cockerill <>
Date: Fri, 15 Jul 2005 13:41:08 +0100

> -----Original Message-----
> From: MacKenzie Smith [mailto:kenzie_at_MIT.EDU]
> Sent: 14 July 2005 23:15
> To:;
> Subject: RE: examples of linking bibliographic RDF to articles
> >>You should use whichever schema you prefer, and PB should
> >>cope with it.
> >
> >Not quite clear how this can be - if I for the sake of argument,
> >invent my own ontology, and identify it as:
> >
> >
> >then how could piggy bank know which bibliographic concepts (e.g. the
> >concept of 'references', as in 'this article references this list of
> >other references, or even the concept of author) relate to the same
> >biliographic concepts in other ontologies such as PRISM? Someone
> >somewhere needs to map relationships between a set of 'supported'
> >ontologies, no?
> Piggy Bank isn't supporting inferencing now, it just exposes
> the facets in
> your RDF for browsing.
> But someday... and then you could supply your RDF schema and any
> relationships you care to express (e.g. OWL equivalency statements)
> along with the actual data encoded in the schema.

Given that I think we agree an importent goal has to be to give end users something genuinely useful, that doesn't require to much technical expertise to comprehend, I guess that there's a need to decide which is most achievable:

(a) getting websites, and scraper writers, to agree on what ontologies to use for important types of content, such as bibliographic content, and then focusing on tools that will work with that specific ontology
(b) encourage a free for all in terms of what ontologies to use, and then for any given domain: e.g. personal information, or bibliographic info, or locational info), set up a framework to allow the creation of "glue ontologies", that will describe the mapping between various most popular ontologies within a particular domain. (I'm making the terminology up here, as my understanding of the state of the art here is pretty limited.

(c) some balanced, pragmatic combination of (a) and (b).

When you say: "Piggy Bank isn't supporting inferencing now", but that one day it might - is is structured such that there are appropriate hooks to allow it to be extended in that way. i.e. in the same way that screen scraper modules can be slotted in, can inference/mapping modules be dropped in. Is sufficient scaffolding in place to make a project to do bibliographic processing/inference feasible?

> >The use case is that Google Scholar should be able to
> harvest from our
> >site (and that of other publishers) a full picture of both
> >(a) what articles we and others host
> >(b) which articles cite which other articles
> >
> >[Currently Google Scholar's attempts to scrape this information from
> >raw HTML and PDFs is very error prone]
> So you're saying our ideal ontology should support both
> primary metadata
> about the article and citation metadata from the article.

Well, it's a classic case of a graph relationship - the article is an article. And each of it's the items on it's reference list is also an article (to a first approximation). And so the same ontology applies in both cases. The only additional thing you need is a way to describe the 'cites' relationship, which PRISM, I understand, does give you.

More accurately, I guess, articles, books, book chapters, websites, etc are all 'citable items', which share some inherited characteristics, but also have some properties that are unique to one subtype. The ontoweb ontology seems to do a reasonable job of expressing that. And in more practical terms, EndNotes reference database also implements this, implicitly, in its data model.
> The ability
> to export
> >all or some of the bibliographic references in your 'bank' into a
> >standard format (e.g. the EndNote import format) would be nifty.
> Good one

Seems like this could be a reasonable minimal test case to evaluate the (a),(b),(c) architectural choices I mentioned above.

This email has been scanned by Postini.
For more information please visit

Received on Fri Jul 15 2005 - 12:38:43 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT