Charon

What is this?

Charon is a framework for building RDFizing proxies that are intended to wrap around existing web sites, screen-scrape their HTML output and provide an RDF representation of that data.

Of course, this is not magic: every different 'class' of web sites requires some tuning and configuration, but Charon is designed for you to focus only on the semantic extraction part and forget about the nitty-gritty details of HTML scraping, HTTP proxying and URL rewriting.

What can I do with this?

You can 'RDFize' an existing web site without the need to change anything on the original web site, but mostly wants to be a tool that enables a smooth transition between a semantically poor web site and a more semantically rich one, hoping to provide more data for RDF consuming clients out there, thus creating a need for the original web to start thinking about publishing RDF directly

The ultimate goal of Charon is therefore to become unneeded :-)

Why was it built?

Because, even if web sites are powered by software that we can patch (for example, DSpace), it might take a while for current installations to upgrade to a new version of the software that includes our patch and since now we have tools that can make use of RDF (but we want real juicy data, not just fake example!), we had to do something and we thought that an RDFizing proxy was the easiest way to go, both technically and politically.

We also think that web sites might want to run their proxies themselves, because sometimes it's a lot easier to screen-scrape and do the semantic extraction rather than modifying the original web site.

Requirements

Charon is a Cocoon-powered web application therefore it requires Apache Cocoon, version 2.2 or above. (No, it's not a mistake, at the time of writing Cocoon 2.2 has not been released yet, but it contains so many nice features in terms of ease of installation that I really couldn't live without it)

Where do I download it?

You can obtain Charon in two different ways:

  1. download a prepackaged distribution
  2. download the files directly from the code repository.

In case you want to download the files from the repository (for example, if you want to have the latest and greatest development snapshot), you need to have a Subversion client installed. At this point, just type

svn co http://simile.mit.edu/repository/charon/trunk/ charon

at the command line and the latest charon distribution will appear in the "charon" directory.

Licensing and legal issues

Charon is open source software and is licensed under the BSD license located in the LICENSE.txt file located in the root of the distribution.

Note however, that this software ships with libraries that are not released under the same license, that we are redistributing them untouched and each of them are licensed according to the terms of the license files located in the ./legal subdirectory of the distribution.

Credits

This software was created by the SIMILE project and originally written by: