EMail RDFizer

These are python scripts that convert email to RDF (using an RDF/XML syntax).

How do I use them?

"mbox2rdf" is thought for those who store their email in mbox files (or have a way to convert their email to mbox). Run it against a folder that contains a bunch of mbox files and it will generate single RDF/XML files for each message in a 'data' folder that you pass at command line. Run it with the -h switch to know more.

"email2rdf" is thought for those who want to process their email directly at injestion time and therefore reads the email from STDIN and dumps the RDF at STDOUT.

How do I get the source code?

You need a Subversion client. Type svn in your shell and see if you have one already installed. If not, go to the Subversion web site and get one.

Then type

 svn co http://simile.mit.edu/repository/RDFizers/email2rdf/

the source code will be fetched and downloaded to the ./email2rdf directory.

How do I run it?

You need a Python interpreter installed on your maachine. Try typing "python --version" in your shell, if the command is not found go to the Python home page or install it from a package manager for your operating system.

Why don't you use ontology X instead of your own?

Because my ontology is better then yours! [sticking tongue out] No, seriously, I just came up with a very simple one. If you have suggestions, please contact us.

Your Python code stinks!

I know! Again, if you have suggestions/patches, they are always welcome!

History

My interest (obsession?) in mining email for understanding of knowledge and community structure comes from my work for Apache Agora (a virtual community visualization tool that has turned inself into Welkin, using RDF as the data interface) that was the result of a long thinking process on how to distill the juice make the best out of a huge community of smart but lazy asses.

There is more information in email than on the web, yet the signal to noise ratio, if possible, is even lower. Mining things like the reply-to topology and/or extracting tags from special de-facto syntaxes crystallized over the years might yield very useful information and with very low metadata-tax costs.

Credits

This software was created and is maintained by the Simile Project and in particular: