Rationale
Several Semantic Web enthusiasts believe that the modeling of data using RDF is a sufficient condition for data interoperability and integration. We believe otherwise: RDF is neither a sufficient nor necessary condition for data interoperability and/or integration to take place, but it does help to normalize data into powerful, flexible, generalized and easy mixable data model such as RDF because it provides a way to reuse tools across various domains.
While we agree that minimizing data integration efforts is a laudable goal and that sharing a common data model is a solid step in simplifying those efforts, they are far from being enough: as our experience in real-life integration of data from various independent digital libraries shows, it is a rare event that data is provided to you modeled in such a way that no transformation/adaptation is required for it to integrate with the existing data in a satisfactory way.
Banach is born out of the necessity of transforming RDF graphs in order to decrease their mismatch with other graphs and facilitate interoperability and integration.
Implementation
Banach operators are implemented as Stackable SAILs. A SAIL (Storage and Inference Layer) is the RDF storage abstraction outlined by the OpenRDF APIs as implemented by the Sesame triple store.
Stackable SAILs are like onion layers: they can wrap an existing SAIL and extend its functionality.
Unlike other transformation pipelines (such as UNIX pipes and XML pipelines) that operate either in push or pull mode, a stackable SAIL can modify the behavior at both data load time (push) and data query time (pull), providing a great deal of flexibility in the implementation strategy.
Operators that work exclusively on transforming the stream of events during load are called "load time" (or sometimes "forward chaining") operators. The advantage of these operators is that they can perform computationally intensive operations because load performance is normally much less requested than runtime performance.
Operators that work exclusively on transforming a query or the results of a query are called "query time" (or sometimes "backward chaining") operators. The advantage of these operators is that the data is stored exactly as it was loaded and changes in the operator's execution can be observed immediately without requiring the data to be reloaded or retransformed.
Operators that work using a mixed strategy of both load time and query time are called hybrid operators.
List of Implemented Operators
- Smoosher - reacts on equivalences between URIs to 'smoosh' the graph accordingly. This is a key operator during dataset integration. This operator is implemented with a hybrid strategy to reduce the query time execution costs by preprocessing the graph at load time, but without the need of data replication that a pure-load-time strategy would require.
- Distiller - distills a subgraph out of a graph based on indirect properties. This operator can be used, for example, to emerge a graph of types or to emerge a social network out of an email exchange graph. The result of the operator is on another named graph, meaning that the original graph is left untouched.
Ideas for Other Operators
We have been collecting various ideas on the rewiring scenarios page.
Why the name Banach?
Banach (pronounced "bah-nack") is named after the Polish mathematician Stefan Banach for his contribution to functional analysis.
Contributing
Banach is an open source software and built around the spirit of open participation and collaboration. There are several ways you can help:
- Blog about Banach
- Subscribe to our mailing lists to show your interest and give us feedback
- Report problems and ask for new features through our issue tracking system (but take a look at our todo list first)
- Send us patches or fixes to the code
Licensing & Legal Issues
Banach is open source software and is licensed under the BSD license.
Note, however, that this software ships with libraries that are not released under the same license; that we interpret their licensing terms to be compatible with ours and that we are redistributing them unmodified. For more information on the licensing terms of the libraries Banach depends on, please refer to the source code.
Credits
This software is maintained by the SIMILE project and in particular:
- Stefano Mazzocchi (original author)
The design of this software has been heavily influenced by the excellent design of the Stackable SAIL API included in the OpenRDF Sesame triple store, to which developers we are most grateful.
| Attribute values | |
|---|---|
| Glossary definition | Banach is a collection of operators that work on RDF graphs to infer, extend, emerge or otherwise transform a graph into another. You can think of it as a transformation pipeline for RDF with a collection of implemented commands. + |


