On Sep 15, 2005, at 5:17 AM, Matthew Cockerill wrote:
>>> Linking to the content would be much more flexible and
>>>
>> easier to process.
>>
>> Generally, yup.
>>
>
> I agree, but there are still quite a few use cases (for example,  
> search engine harvesting) where embedding is desirable.
This is a great thread!   Thank you to all posters on it.
I'm building a system that is designed to crawl a federation of sites  
(nineteenth century literature - 
http://www.nines.org), harvest RDF  
metdata, and distill it into a faceted browsing + full-text search  
interface.  So far I've been prototyping various pieces of it, and am  
starting to flesh it out into a deployed and usable system.
Here's the current architecture - I'm crawling the archives with  
Nutch.  The sites themselves will have the <link> to RDF/XML files,  
just as Piggy Bank uses.  A custom process follows-up after the  
crawls to build Lucene indexes for each archive crawled and merges  
them into a single index.  This index allows for faceted browsing on  
a specific sub-set of the RDF metadata, as well as full-text search  
of the text from the HTML page that the RDF link was on.
There is another part of my application where it'll be Piggy Bank- 
like allowing users to collect "objects", tag them, and browse their  
collection.
I considered the embedded RDF approach myself, but it would increase  
the size of the page perhaps dramatically (I want to encourage our  
archives to push as much metadata out as possible - the more the  
merrier!).  It would be a burden on the typical HTML-only browsing of  
the archive to download more HTML with no benefit.
Again, thanks for a very timely thread touching on my current work -  
very helpful!
     Erik
Received on Thu Sep 15 2005 - 23:37:05 EDT