Re: Explanation of facet frequency bug
Here are some preliminary thoughts.
A set of 1000 facets is a corpus. How do you search a corpus? You
need useful descriptions of each element of the corpus. Textual
descriptions and metadata. Textual description is obvious---ideally,
when we create an ontology, someone should write a document describing
each predicate in the ontology. Of course, since we are in the
metadata business, it's nice to think about metadata for the
predicates. There's obvious stuff---eg, the RDFS info about the
predicates, like the domain and range of the predicate. But it's
interesting to think about what other kinds of predicates can be
descriptive for an end user. To think about this I'd like to take a
look at the list of predicates. Is it available somewhere?
Given the metadata, there are various tool for searching it. I'd like
to see Vineet's metadata-based fuzzy browser applied to this problem,
for example.
Mailing-List: contact general-help_at_simile.mit.edu; run by ezmlm
X-No-Archive: yes
Reply-To: <general_at_simile.mit.edu>
Date: Wed, 20 Oct 2004 11:21:46 -0400
From: Stefano Mazzocchi <stefanom_at_mit.edu>
X-LocalTest: Nonlocal Origin ([18.51.2.218]
X-Spam-Level:
X-Spam-Status: No, hits=-4.9 required=5.0 tests=AWL,BAYES_00 autolearn=ham
version=2.63
X-SpamBouncer: 2.0 beta (9/23/04)
X-SBNote: Bulk Email (From_Daemon/Listserv/Resent/Precedence)
X-SBRule: Bogus HELO
X-SBRule: From domain matches first external Received domain
X-SBScore: 3 (Spam Threshold: 20) (Block Threshold: 9)
X-SBClass: Bulk
David R. Karger wrote:
> It seems to me that if you hae 1000 facets to deal with, then the
> problem of looking at or finding the right facets is an information
> retrieval problem in its own right. Rather than hardwiring special
> purpose tools for working with the facets, we should be representing
> the facets in a way that lets them be treated as a corpus to which we
> apply all the traditional information retrieval tools.
David,
I can hardly agree more with the fact that, as we stand, longwell will
not be able to scale to a very large corpus with a very large number of
facet instances.
We already have a facet search box in longwell, but given how it's
implemented (client side grepping) it is not going to scale either.
Now: how do we do this? figuring it out is one of the next deliverables
of the project, along with the triple-store scalability and being pretty
much the UI designer of the team, I would love to hear your thoughts on
how we can integrate a browsing and searching interface, on a
not-so-rich UI framework like a standard browser.
--
Stefano Mazzocchi
Research Scientist Digital Libraries Research Group
Massachusetts Institute of Technology location: E25-131C
77 Massachusetts Ave telephone: +1 (617) 253-1096
Cambridge, MA 02139-4307 email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Thu Oct 21 2004 - 05:24:48 EDT
This archive was generated by hypermail 2.3.0
: Thu Aug 09 2012 - 16:39:17 EDT