Re: structured bibliographic info for BioMed Central articles now available as RDF

From: Erik Hatcher <esh6h_at_virginia.edu>
Date: Thu, 15 Sep 2005 13:50:04 -0400

On Sep 15, 2005, at 5:17 AM, Matthew Cockerill wrote:
>>> Linking to the content would be much more flexible and
>>>
>> easier to process.
>>
>> Generally, yup.
>>
>
> I agree, but there are still quite a few use cases (for example,
> search engine harvesting) where embedding is desirable.

This is a great thread! Thank you to all posters on it.

I'm building a system that is designed to crawl a federation of sites
(nineteenth century literature - http://www.nines.org), harvest RDF
metdata, and distill it into a faceted browsing + full-text search
interface. So far I've been prototyping various pieces of it, and am
starting to flesh it out into a deployed and usable system.

Here's the current architecture - I'm crawling the archives with
Nutch. The sites themselves will have the <link> to RDF/XML files,
just as Piggy Bank uses. A custom process follows-up after the
crawls to build Lucene indexes for each archive crawled and merges
them into a single index. This index allows for faceted browsing on
a specific sub-set of the RDF metadata, as well as full-text search
of the text from the HTML page that the RDF link was on.

There is another part of my application where it'll be Piggy Bank-
like allowing users to collect "objects", tag them, and browse their
collection.

I considered the embedded RDF approach myself, but it would increase
the size of the page perhaps dramatically (I want to encourage our
archives to push as much metadata out as possible - the more the
merrier!). It would be a burden on the typical HTML-only browsing of
the archive to download more HTML with no benefit.

Again, thanks for a very timely thread touching on my current work -
very helpful!

     Erik
Received on Thu Sep 15 2005 - 23:37:05 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT