Re: structured bibliographic info for BioMed Central articles now available as RDF

From: Stefano Mazzocchi <stefanom_at_mit.edu>
Date: Thu, 15 Sep 2005 21:26:55 -0400

Erik Hatcher wrote:

> I considered the embedded RDF approach myself, but it would increase the
> size of the page perhaps dramatically (I want to encourage our archives
> to push as much metadata out as possible - the more the merrier!). It
> would be a burden on the typical HTML-only browsing of the archive to
> download more HTML with no benefit.

This is a very good point: the embedded approach has another problem, it
doesn't scale. The more metadata we are able to attach to a particular
page, the more it will find itself bloated and duplicating it.

I mean, look at RSS/Atom, nobody ever mentioned that it would be better
to embed RSS into a comment inside a web page just because that would
save an HTTP request.

Moreover: it could well be that parts of the page change while the
metadata that the page refers to doesn't. In this case, crawlers will
have to re-fetch the embedded RDF data even if it hasn't changed,
because they have no way to ask for that part alone (and have the server
return a 'content is fresh' response, which is extremely fast to
generate and consumes very little bandwidth, as blogs all over the world
show).

I'm sorry, but the "two links = double the load" argument is just bogus,
separating the parts allow independent caching, which is going, over the
long run, to reduce the load.

And the "it's easier for google" one... they do it for blogs[1], I don't
see why they can't do it for more general metadata.

[1] http://blogsearch.google.com/

-- 
Stefano Mazzocchi
Research Scientist                 Digital Libraries Research Group
Massachusetts Institute of Technology            location: E25-131C
77 Massachusetts Ave                   telephone: +1 (617) 253-1096
Cambridge, MA  02139-4307              email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Fri Sep 16 2005 - 01:22:28 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT