Re: Longwell blog view

From: Erik Hatcher <>
Date: Mon, 8 Aug 2005 14:05:52 -0400

On Aug 8, 2005, at 1:48 PM, Stefano Mazzocchi wrote:

> Danny Ayers wrote:
>> Ok, thanks. I keep running into the store sync issue, but this is the
>> first time non-RDF data (the Lucene index) has been involved. Hmm,
>> right, and it doesn't exactly line up against SPARQL regex.
> Yup.
> I think Erik (hatcher) also ran into this problem while working on
> sychronizing kowari and lucene indexes.... one architectural vision
> would be to add full-text search potentials to the triple store and
> avoid the synchronization issue alltogether.

I'll just clarify where I'm at with my current project (one of them
anyway). The goal is crawl 19th century literature sites and harvest
the HTML pages as well as the RDF hidden behind as part of http:// At this point I'm _just prototyping_ and have been
toying with Nutch to crawl sites, and have written a basic plugin to
handle RDF by having it loaded into Kowari. I have not tinkered with
Kowari's Lucene model capabilities yet. For, I'm currently
leaning towards using Nutch solely for its crawling/fetching
capabilities, and then write custom code to walk the web database it
caches to index things in my own custom way within Lucene. It still
remains to be seen if the full picture of the RDF is necessary for
this application.

There is certainly an interesting "mismatch" with structured versus
full-text indexing and I've yet to wrap my head around how to combine
the two really effectively. But I haven't tried to synchronize
between Kowari and Lucene (yet, anyway). Kowari has a nice full-text
model that could then be used to query across structure and full-text
- soon I'll try leveraging it to see if it fits what I'm doing.

My other project (Collex) is to consume a "resource" at a time adding
it to a collection with its full metadata and allow a collection to
be navigated in various ways very similar to Longwell+flickr

Again, all my work has been very trivial local prototypes. In the
near future we'll be exposing things to the web to be played around
with, and I'll be sure to announce it here :)

> Another one, more general and probably less intrusive for the
> triple stores, is have the ability to register listeners to the
> triple store itself, and get some code called back whenever there
> is some event taking place (all major content repositories work
> this way).
> The best way would be to be able to have an API such as:
> void registerHook(Query query, Listener listener);
> where a given listener is called by the triple store everytime a
> statement is added/changed/removed that belongs to a particular
> result space of the given query.
> With that, we could do things like:
> 1) update the lucene indexes
> 2) invalidate caches at the webapp level
> 3) trigger synchronization between different stores
> 4) send notification events
> ...
> the JSR 170 API (java content repository) has a pretty established
> API for this aspect and it's extremely useful.

I concur that this would be a slick hook to have!

Received on Mon Aug 08 2005 - 18:02:22 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT