Faceted Browser

A 'faceted browser' is a software tool that uses faceted navigation to help users browse information.

First pioneered by the Flamenco project, 'faceted navigation' gives the users the ability to find items based on more than one dimension, to see breakdowns and projections of the items along different axis, which helps users gather insights about the data they are exploring.

The most useful 'faceted navigation' is 'context dependent', meaning that available facets, facet values and their count is based on the current set of results that the user is browsing.

Unfortunately, such 'context dependent faceted navigation' is also the most computationally intensive, due to the fact that the number of possible 'contexts' in the 'faceted navigation' browsing space grows exponentially with the number of items, the number of facets and their values.

To speed up results, faceted browsing engines either cache or precalculate the faceted navigation in the most 'frequently used' contexts (turning the CPU intensive problem into a memory-intensive problem), while the faceted navigation of infrequently used contexts is calculated at runtime by simply counting the values of the search result set.

In the cases where it is possible to precalculate all facet navigations in all contexts, faceted browsing is as fast as obtaining results from an inverted index (such precalculations could be stored on disk or RAM, depending on the configuration and size).

In those cases where it is impossible to precalculate all states, selecting which one of the contexts should be precalculated and stored is the hardest thing.

Examples

Longwell uses a RAM-only 'most frequently used' approach to cache the results of context calculations.

Apache Solr uses a 'cache prewarming' approach where it's up to system that is built on top of Solr to know what contexts to precalculate (Solr doesn't provide faceted-browsing specific optimizations)

Endeca MDEX (the system that powers the NCSU Library Catalog) uses a (patended) approach where the faceted navigation is considered a graph, each node is a navigation context and each link a possible transition. Each context is then assigned a unique identifier, which is used by the user interface to retrieve the values of the possible 'navigation transitions' (the facets and the values in that current navigation context) simply by performing a lookup in the index (if, of course, the index contains that context). The algorithm they use to understand what parts of the navigation network should be indexed and what should be calculated at runtime is not known.

WorldCat uses a custom-written engine where all of the information and indices are stored in RAM. The first few tiers of navigation contexts are precalculated and the rest are calculated at runtime.