Re: Explanation of facet frequency bug

From: David R. Karger <>
Date: Thu, 21 Oct 2004 01:24:49 -0400

Here are some preliminary thoughts.

A set of 1000 facets is a corpus. How do you search a corpus? You
need useful descriptions of each element of the corpus. Textual
descriptions and metadata. Textual description is obvious---ideally,
when we create an ontology, someone should write a document describing
each predicate in the ontology. Of course, since we are in the
metadata business, it's nice to think about metadata for the
predicates. There's obvious stuff---eg, the RDFS info about the
predicates, like the domain and range of the predicate. But it's
interesting to think about what other kinds of predicates can be
descriptive for an end user. To think about this I'd like to take a
look at the list of predicates. Is it available somewhere?

Given the metadata, there are various tool for searching it. I'd like
to see Vineet's metadata-based fuzzy browser applied to this problem,
for example.

   Mailing-List: contact; run by ezmlm
   X-No-Archive: yes
   Reply-To: <>
   Date: Wed, 20 Oct 2004 11:21:46 -0400
   From: Stefano Mazzocchi <>
   X-LocalTest: Nonlocal Origin ([]
   X-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham
   X-SpamBouncer: 2.0 beta (9/23/04)
   X-SBNote: Bulk Email (From_Daemon/Listserv/Resent/Precedence)
   X-SBRule: Bogus HELO
   X-SBRule: From domain matches first external Received domain
   X-SBScore: 3 (Spam Threshold: 20) (Block Threshold: 9)
   X-SBClass: Bulk

   David R. Karger wrote:

> It seems to me that if you hae 1000 facets to deal with, then the
> problem of looking at or finding the right facets is an information
> retrieval problem in its own right. Rather than hardwiring special
> purpose tools for working with the facets, we should be representing
> the facets in a way that lets them be treated as a corpus to which we
> apply all the traditional information retrieval tools.


   I can hardly agree more with the fact that, as we stand, longwell will
   not be able to scale to a very large corpus with a very large number of
   facet instances.

   We already have a facet search box in longwell, but given how it's
   implemented (client side grepping) it is not going to scale either.

   Now: how do we do this? figuring it out is one of the next deliverables
   of the project, along with the triple-store scalability and being pretty
   much the UI designer of the team, I would love to hear your thoughts on
   how we can integrate a browsing and searching interface, on a
   not-so-rich UI framework like a standard browser.

   Stefano Mazzocchi
   Research Scientist Digital Libraries Research Group
   Massachusetts Institute of Technology location: E25-131C
   77 Massachusetts Ave telephone: +1 (617) 253-1096
   Cambridge, MA 02139-4307 email: stefanom at mit . edu
Received on Thu Oct 21 2004 - 05:24:48 EDT

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:17 EDT