RE: examples of linking bibliographic RDF to articles

From: Matthew Cockerill <>
Date: Wed, 13 Jul 2005 19:35:02 +0100

MacKenzie suggested that I send a copy of the email below to this list,
since it there may be others who would like to contribute to this

[Hope the threading of the discussion is reasonably clear...]

Matt Cockerill
Director of Operations
BioMed Central (

Subject: RE: examples of linking bibliographic RDF to articles
Date: 13 July 2005 16:33:54 BST
To: kenzie_at_MIT.EDU

> -----Original Message-----
> From: MacKenzie Smith [mailto:kenzie_at_MIT.EDU]
> Sent: 12 July 2005 22:48
> To: Matthew Cockerill
> Cc:
> Subject: Re: examples of linking bibliographic RDF to articles
> Hi Matthew,
> I'm not sure that you know that I'm one of the SIMILE PIs,
> which is part of
> the reason I joined the Science Commons working group (i.e.
> similar motives
> to your own). So I'd be happy to answer any of your questions
> about Piggy
> Bank, or refer you to the right person on the team.


I wasn't sure precisely what your role on SIMILE was - that's good to

>>> I think the key to adoption/uptake is to provide something that's
>>> genuinely useful as soon as possible though, and so rather
> than the more
>>> ambitious 'representation of biological knowledge' area, it
> seems to me
>>> that it would be great if Piggy Bank could act, amongst
> othet things, as
>>> a bibliographic management tool on steroids, since all
> scientists pretty
>>> much depend on tools for managing references
> That's certainly one of my hopes for it, and one that I'm
> promoting here at
> MIT for our own researchers.
>>> I'm wondering if by simply adding an <link rel= > tag on
> our our article
>>> pages to some appropriate RDF describing the bibliographic
> details of the
>>> article concerned, we could
>>> (a) bypass the need for a piggy bank screenscraper
>>> (b) as a side-benefit, make the job of services like
> Google Scholar
>>> vastly easier, by preventing them from needing to do screenscraping.
> Certainly. See for the
> instructions.
>>> So, my question is:
>>> Do you have any advice (or could you provide examples on
> the piggy bank
>>> website) for how best to express bibliographic
> information/metadata in
>>> RDF linked from a web page?
>>> What do you think about the idea of using PRISM:
> You should use whichever schema you prefer, and PB should
> cope with it.

Not quite clear how this can be - if I for the sake of argument,
invent my own ontology, and identify it as:

then how could piggy bank know which bibliographic concepts (e.g. the
concept of 'references', as in 'this article references this list of
other references, or even the concept of author) relate to the same
biliographic concepts in other ontologies such as PRISM? Someone
somewhere needs to map relationships between a set of 'supported'
ontologies, no?

> PRISM would be fine.
> One thing to watch out for is that all the facets you want to
> be browsable
> must be represented as URIs rather than direct values (see
> for the
> longer version of what I just said).

Seems reasonable.

David pointed me to the citeseer overlay here:

  This article
   links to this RDF

This seems to point to this ontology:
and other than that, does not use URIs, as such.

So is that in fact a good example? Or should it be using other URIs to
be more specific?

 From a quick browse of OntoWeb, I guess what is being used is a subset
of the Article concept, is that right?

>>> If we wanted to describe both the bibliographic metadata
> of the article
>>> itself, and of each article in it's reference list, do you
> have any
>>> ideas/advice/examples on how best to express that?
> Do you want them to be clearly distinguishable? Related in
> some way? I
> don't think we have a best practice for that yet, but it
> would probably be
> of interest to other content providers if we (mutually) came
> up with some
> model for that.

Certainly they need to be distinguishable.

The use case is that Google Scholar should be able to harvest from our
site (and that of other publishers) a full picture of both
(a) what articles we and others host
(b) which articles cite which other articles

[Currently Google Scholar's attempts to scrape this information from
raw HTML and PDFs is very error prone]

A couple of related notes, while I'm mentioning Google Scholar.
Anurag would prefer it if basic article RDF metadata could be embedded
in a comment tag, like creative commons license info.
(To avoid doubling up the number of URLs to be retrieved by the Google

See this discussion on the relative pros and cons of the <link> tag and
the comment approach. The comment approach is clearly an ugly hack, but
it is being used widely for CC license if nothing else. Can/should/does
piggy bank support RDF in comments?
Is there a better way to include RDF in a composite XHTML document, not
by reference? (but no doubt that will break all old browsers]

Lastly, another note from Anurag is that he says please don't try to
scrape Google Scholar, as their anti-scraping systems will cut you off.
My perception is that Google are rather paranoid that people scraping
GS could build systems that would conflict with the organisations such
as Crossref, who they depend on to make GS work.

On a more positive front, by sense is that if Piggy Bank as a
bibliographic reference manager starts to take off, GS would definitely
consider exposing appropriate RDF itself for individual use (i.e. the
PB model) - just not for mass harvesting.

>>> Also, would it be possible/sensible to add interface
> functionality into
>>> piggy bank (or alongside piggy bank) that could make
> intelligent use of
>>> bibliographic information stored in piggy bank in some useful way?
> What sort of intelligent use did you have in mind? We are certainly
> thinking about new functionality, such as browsing by
> timeline and adding
> data cleanup/mapping functions, but your suggestions for what
> would make
> the tool more valuable to scientists for bibliographic data
> management
> would be very welcome.

I will have a think about that, but for a start, the ability to export
all or some of the bibliographic references in your 'bank' into a
standard format (e.g. the EndNote import format) would be nifty.

If you look at you can get an idea of just how
sophisticated web based reference management can get - ultimately
there's all sort that can be done.

It might also be interesting to see if someone sort of social
recommendation service might emerge from the use of piggy bank.
the fact that you choose to piggy bank an article is a much more
meaningful indication of interest than simply viewing the web page.
As such, like Amazone purchases, bibliographic citations, or web links,
it provides good raw material for mining for information about what is
most interesting, and what things are related to what other things.
Maybe something along those lines could be developed?

> Hope this helps,
> MacKenzie
> MacKenzie Smith
> Associate Director for Technology
> MIT Libraries
> Building E25-131d
> 77 Massachusetts Avenue
> Cambridge, MA 02139
> (617)253-8184
Received on Wed Jul 13 2005 - 18:32:08 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT