Re: examples of linking bibliographic RDF to articles from Stefano Mazzocchi on 2005-07-15 (stdin)

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Fri, 15 Jul 2005 12:04:23 -0400

Matthew Cockerill wrote:
> Alf,
>
> Actually, long term I think scientific authors do need to be identified URIs.
> I think ultimately the semantic web, and the increasing sophistication of bibliographic tools built on it, will drive this.
>
> The need for author specific IDs has been raised several times in the scientific literature.
> e.g. http://www.nature.com/nature/journal/v411/n6835/full/411237b0.html
> Nature 411, 237 (17 May 2001) | doi: 10.1038/35077304
> Sorting out the Smiths
>
> [But there was a more recent one too.]
>
> The ambiguity in author names has pretty serious consequences - e.g. it is disproportionately difficult for editors of our journals to identify if someone is a suitable reviewer for a paper, if they have a common name (Chinese names being a particular challenge), since searching bibliographic databases it can be very difficult to identify which people are genuinely the same person.
>
> It might be felt that it's difficult to see how we can get from the current situation, to a situation where author names on papers are always include an ID (ideally a URI), but as the major indexing services (PubMed, ISI, and Google Scholar), say, increasingly base themselves on electronic data supplied by the publisher, it's easily conceivable that as well as (or perhaps better, instead of) requiring an email address from authors, journals could require authors to supply their author URI, obtained from an international open registry analagous to Crossref.
>
> Obviously, not all journals would do this immediately, but it's conceivable that a bibliographic service like PubMed or Google Scholar could generate its own best estimate of the set of distinct authors, represented within the corpus of data that it indexes, using statistical text analysis techniques, and could have it's own namespace of author URIs, which would map onto the official author-registered URIs.
>
> Authors would then start to have a strong incentive to register their real author URI, and to correct any mismappings that exist in the attempts at author disambiguation that were generated by the bibliographic databases.
>
> So although it's a pretty thorny problem, I do think that the elements of an achievable solution may be starting to fall into place.

Using hashed email addresses to create URIs instead of hashed "paper +
author name" helps a lot because the amount of URIs we will have to deal
with is reduced by orders of magnitude, but the problem is far from
being solved in that case.

Truely unique IDs inherently exhibit privacy issues. I could use my MIT
ID number, or my social security # or my italian fiscal code and these
will clearly identify me, but I don't want you to know them! (sure, i
could hash them, but do I trust sha-1 or md5 enough?)

Even with email I might like you to know it, but not the spammers!

Use and abuse have a thin line that separates them and once the data is
out there you have no more control on where it goes.

So, if you apply privacy concerns with the need to differentiation and
uniqueness, you can think that people will have *many* different URIs
that identify them, just like you have several different email addresses
that all reach you, in one way or another and that you might use to
identify yourself differently depending on the context (this allows you,
for example, to trace the percolation of your information thru a
system... just like people use hotmail or gmail accounts for registering
to web sites they don't trust in keeping their email secret)

My point is: without the ability to draw equivalences between URIs (or
state their difference), no system will work.

And if you think that a worldwide centralized URI registry would be
enough to solve the problem, rethink your strategy because it won't
happen: networked systems avoid centralization as the plague, because
it's inherently more vulnerable.

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------

Received on Fri Jul 15 2005 - 16:01:36 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT