RE: examples of linking bibliographic RDF to articles

From: MacKenzie Smith <kenzie_at_MIT.EDU>
Date: Thu, 14 Jul 2005 18:15:28 -0400

Hi Matt, sorry for the slow response -- busy week.

>>>>Do you have any advice (or could you provide examples on the piggy bank
>>>>website) for how best to express bibliographic information/metadata in
>>>>RDF linked from a web page?
>>>>What do you think about the idea of using PRISM:
>>You should use whichever schema you prefer, and PB should
>>cope with it.
>Not quite clear how this can be - if I for the sake of argument,
>invent my own ontology, and identify it as:
>then how could piggy bank know which bibliographic concepts (e.g. the
>concept of 'references', as in 'this article references this list of
>other references, or even the concept of author) relate to the same
>biliographic concepts in other ontologies such as PRISM? Someone
>somewhere needs to map relationships between a set of 'supported'
>ontologies, no?

Piggy Bank isn't supporting inferencing now, it just exposes the facets in
your RDF for browsing.
But someday... and then you could supply your RDF schema and any
relationships you care to express (e.g. OWL equivalency statements)
along with the actual data encoded in the schema.

We do have some plans to build tools that people can use to do that mapping
across ontologies and put them into browsers like Longwell
and PB, but right now it's all done by hand just to demonstrate the

>>One thing to watch out for is that all the facets you want to
>>be browsable must be represented as URIs rather than direct values (see
>> for the
>>longer version of what I just said).
>Seems reasonable.
>David pointed me to the citeseer overlay here:
> This article
> links to this RDF
>This seems to point to this ontology:
>and other than that, does not use URIs, as such.
>So is that in fact a good example? Or should it be using other URIs to
>be more specific?

I'm not familiar with the ontoweb ontology, but what you say sounds right.

> From a quick browse of OntoWeb, I guess what is being used is a subset
>of the Article concept, is that right?

Again, not sure, but sounds likely

>>>> If we wanted to describe both the bibliographic metadata of the article
>>>>itself, and of each article in it's reference list, do you have any
>>>>ideas/advice/examples on how best to express that?
>>Do you want them to be clearly distinguishable? Related in
>>some way? I don't think we have a best practice for that yet, but it
>>would probably be of interest to other content providers if we (mutually)
>>up with some model for that.
>Certainly they need to be distinguishable.
>The use case is that Google Scholar should be able to harvest from our
>site (and that of other publishers) a full picture of both
>(a) what articles we and others host
>(b) which articles cite which other articles
>[Currently Google Scholar's attempts to scrape this information from
>raw HTML and PDFs is very error prone]

So you're saying our ideal ontology should support both primary metadata
about the article and citation metadata from the article.

>A couple of related notes, while I'm mentioning Google Scholar.
>Anurag would prefer it if basic article RDF metadata could be embedded
>in a comment tag, like creative commons license info.
>(To avoid doubling up the number of URLs to be retrieved by the Google
>See this discussion on the relative pros and cons of the <link> tag and
>the comment approach. The comment approach is clearly an ugly hack, but
>it is being used widely for CC license if nothing else. Can/should/does
>piggy bank support RDF in comments?
>Is there a better way to include RDF in a composite XHTML document, not
>by reference? (but no doubt that will break all old browsers]

Yeah yeah, I've had that argument with Anurag too, in the DSpace context.
CC metadata is very small, the RDF we're harvesting is potentially very
large (e.g. an long result set's worth of metadata ala citeseer). I'm not
sure sticking that in a comment is a good idea, but maybe others will chime in.

>Lastly, another note from Anurag is that he says please don't try to
>scrape Google Scholar, as their anti-scraping systems will cut you off.
>My perception is that Google are rather paranoid that people scraping
>GS could build systems that would conflict with the organisations such
>as Crossref, who they depend on to make GS work.

Interesting. But if they want GS to be of high value to scholars then it
needs to work with tools like EndNote and Piggy Bank,
whether they like it or not.

>On a more positive front, by sense is that if Piggy Bank as a
>bibliographic reference manager starts to take off, GS would definitely
>consider exposing appropriate RDF itself for individual use (i.e. the
>PB model) - just not for mass harvesting.

Good. Maybe I'll ask Anurag if they would do a pilot project to try
exposing their metadata RDF along the lines that you described.

>>>>Also, would it be possible/sensible to add interface functionality into
>>>>piggy bank (or alongside piggy bank) that could make intelligent use of
>>>>bibliographic information stored in piggy bank in some useful way?
>>What sort of intelligent use did you have in mind? We are certainly
>>thinking about new functionality, such as browsing by
>>timeline and adding
>>data cleanup/mapping functions, but your suggestions for what
>>would make
>>the tool more valuable to scientists for bibliographic data
>>would be very welcome.
>I will have a think about that, but for a start, the ability to export
>all or some of the bibliographic references in your 'bank' into a
>standard format (e.g. the EndNote import format) would be nifty.

Good one

>If you look at you can get an idea of just how
>sophisticated web based reference management can get - ultimately
>there's all sort that can be done.

Yes, we subscribe to this at MIT. I'm familiar with its functionality.

>It might also be interesting to see if someone sort of social
>recommendation service might emerge from the use of piggy bank.
>the fact that you choose to piggy bank an article is a much more
>meaningful indication of interest than simply viewing the web page.
>As such, like Amazone purchases, bibliographic citations, or web links,
>it provides good raw material for mining for information about what is
>most interesting, and what things are related to what other things.
>Maybe something along those lines could be developed?

I think that was the basic idea behind the Semantic Bank, but you're right
that it needs more functionality
to entice people to actually publish their data to it. We were thinking
that for smaller communities
(e.g. an academic department) that would be pretty easy to do.

Received on Thu Jul 14 2005 - 22:12:45 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT