RE: examples of linking bibliographic RDF to articles

From: Matthew Cockerill <>
Date: Fri, 15 Jul 2005 17:15:52 +0100

In terms of central registries:

Depends what you mean by central

Coordinated registries which manage a global namespace exist and function well for, to give just a few examples:

* product bar codes (UPC)
* books (ISBN)
* serials (ISSN)
* Scholarly articles (DOI)
* Protein structures (PDB ids)
* internet hostnames (DNS)

The same could apply for scholarly article author identifiers.

I'm not convinced that privacy issues make this impossible. The information conveyed by the authorship URI would be simply that (I, John Smith, author of this article A, am one and same John Smith that is author of article B, but am not the same John Smith as the author of article C)

Publication of scientific articles is not anonymous - it consciously and inherently involves a loss of privacy on the part of the author.
An article may express the fact that the author has done research on animals that may make that author a target for animal rights campaigners, and yet that article will still identify that author and the institution they work at.

Relative to that, it seems that the privacy issues involved in establishing which of the John Smiths are the same person are very minor.


> And if you think that a worldwide centralized URI registry would be
> enough to solve the problem, rethink your strategy because it won't
> happen: networked systems avoid centralization as the plague, because
> it's inherently more vulnerable.

> -----Original Message-----
> From: Stefano Mazzocchi []
> Sent: 15 July 2005 17:04
> To:
> Subject: Re: examples of linking bibliographic RDF to articles
> Matthew Cockerill wrote:
> > Alf,
> >
> > Actually, long term I think scientific authors do need to
> be identified URIs.
> > I think ultimately the semantic web, and the increasing
> sophistication of bibliographic tools built on it, will drive this.
> >
> > The need for author specific IDs has been raised several
> times in the scientific literature.
> > e.g.
> > Nature 411, 237 (17 May 2001) | doi: 10.1038/35077304
> > Sorting out the Smiths
> >
> > [But there was a more recent one too.]
> >
> > The ambiguity in author names has pretty serious
> consequences - e.g. it is disproportionately difficult for
> editors of our journals to identify if someone is a suitable
> reviewer for a paper, if they have a common name (Chinese
> names being a particular challenge), since searching
> bibliographic databases it can be very difficult to identify
> which people are genuinely the same person.
> >
> > It might be felt that it's difficult to see how we can get
> from the current situation, to a situation where author names
> on papers are always include an ID (ideally a URI), but as
> the major indexing services (PubMed, ISI, and Google
> Scholar), say, increasingly base themselves on electronic
> data supplied by the publisher, it's easily conceivable that
> as well as (or perhaps better, instead of) requiring an email
> address from authors, journals could require authors to
> supply their author URI, obtained from an international open
> registry analagous to Crossref.
> >
> > Obviously, not all journals would do this immediately, but
> it's conceivable that a bibliographic service like PubMed or
> Google Scholar could generate its own best estimate of the
> set of distinct authors, represented within the corpus of
> data that it indexes, using statistical text analysis
> techniques, and could have it's own namespace of author URIs,
> which would map onto the official author-registered URIs.
> >
> > Authors would then start to have a strong incentive to
> register their real author URI, and to correct any
> mismappings that exist in the attempts at author
> disambiguation that were generated by the bibliographic databases.
> >
> > So although it's a pretty thorny problem, I do think that
> the elements of an achievable solution may be starting to
> fall into place.
> Using hashed email addresses to create URIs instead of hashed "paper +
> author name" helps a lot because the amount of URIs we will
> have to deal
> with is reduced by orders of magnitude, but the problem is far from
> being solved in that case.
> Truely unique IDs inherently exhibit privacy issues. I could
> use my MIT
> ID number, or my social security # or my italian fiscal code and these
> will clearly identify me, but I don't want you to know them! (sure, i
> could hash them, but do I trust sha-1 or md5 enough?)
> Even with email I might like you to know it, but not the spammers!
> Use and abuse have a thin line that separates them and once
> the data is
> out there you have no more control on where it goes.
> So, if you apply privacy concerns with the need to differentiation and
> uniqueness, you can think that people will have *many* different URIs
> that identify them, just like you have several different
> email addresses
> that all reach you, in one way or another and that you might use to
> identify yourself differently depending on the context (this
> allows you,
> for example, to trace the percolation of your information thru a
> system... just like people use hotmail or gmail accounts for
> registering
> to web sites they don't trust in keeping their email secret)
> My point is: without the ability to draw equivalences between URIs (or
> state their difference), no system will work.
> And if you think that a worldwide centralized URI registry would be
> enough to solve the problem, rethink your strategy because it won't
> happen: networked systems avoid centralization as the plague, because
> it's inherently more vulnerable.
> --
> Stefano Mazzocchi
> Research Scientist Digital Libraries Research Group
> Massachusetts Institute of Technology location: E25-131C
> 77 Massachusetts Ave telephone: +1 (617) 253-1096
> Cambridge, MA 02139-4307 email: stefanom at mit . edu
> -------------------------------------------------------------------
This email has been scanned by Postini.
For more information please visit

Received on Fri Jul 15 2005 - 16:13:01 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT