RE: [announcement] DSpace Scraper - Reloaded

From: Matthew Cockerill <>
Date: Wed, 10 Aug 2005 19:06:18 +0100

In the real world of bibliographic systems, tools are in wide use that can and do map data between different bibliographic systems such as

PubMed XML format
Crossref piped format
EndNote export format
Bibtex format
BioMed Central native XML format
OpenURL format
RSS/Dublin Core format


The mappings that achieve this are not perfect. But the results that are achieved by having these mappings are real, and are profoundly important.
I think there's a danger that issues like this can be negelected as being "provably unanswerable", when in fact the utility of the partial solutions that can be achieved is undervalued.

In fact, weirdly, I remember spending weeks having exactly this debate in 1995, when I wanted to add EndNote format export to our Evaluated MEDLINE system (at the time, pre-PubMed, it was the first free web medline). The lead developer on the project was strongly opposed to adding EndNote export, since it was messy - the data didn't fit quite right, and it didn't fit with the elegance of the rest of the project. But when we finally introduced the EndNote download option, it was by far the most frequently used and praised aspect of the service.


> -----Original Message-----
> From: Stefano Mazzocchi []
> Sent: 10 August 2005 18:37
> To:
> Subject: Re: [announcement] DSpace Scraper - Reloaded
> Matthew Cockerill wrote:
> > This in fact ties back to my original question, when
> joining the list, about the best ontology(ies) to use for
> bibliographic data.
> For the record, my experience with open and decentralized systems
> strongly indicates that this question has no answer.
> I believe that the solution (well, if not a solution a step
> forward) for
> data interoperability at a global scale is a mix of explicit
> ontological
> mappings and transformation/adaptation rules.
> People already disagree on the use of even basic dublin core
> fields...
> which shows pretty evidently (and against the original
> intention of the
> DC working group) that 'semantic linking by field collision'
> (even when
> namespaced thus globally unique) is a myth, just like words
> in natural
> languages need context to be fully understood and disambiguated, the
> same will be true for metadata the more complex, organic and
> decentralized the system becomes.
> Old-school librarians (and a large population of both the XML and RDF
> world) hate this because it's against all they believe in and
> struggle
> to achieve: order and 'reductio ad unum'.
> See my blog post on 'data first' systems for more.
> --
> Stefano Mazzocchi
> Research Scientist Digital Libraries Research Group
> Massachusetts Institute of Technology location: E25-131C
> 77 Massachusetts Ave telephone: +1 (617) 253-1096
> Cambridge, MA 02139-4307 email: stefanom at mit . edu
> -------------------------------------------------------------------
This email has been scanned by Postini.
For more information please visit

Received on Wed Aug 10 2005 - 18:03:04 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT