RDFizers

glossary definition:=The RDFizer project is directory of tools for converting various data formats into RDF. In addition we provide a home for some of these tools.

What can I do with these?

You can have a computer generate the RDF representation of your data for you, instead of you doing it by hand.

Why were they built?

Writing RDF by hand can be a very time consuming and error prone experience, but the Semantic Web suffers from the chicken-egg problem that no killer app will be written without enough data and no data will be exposed without the benefit of a killer app that uses it.

This is one of our solutions to unlocking the catch-22: identify existing datasets that could be of potential interest and write tools that can capture at least a little bit of those data structures.

What do you mean by "converting into RDF"?

Just like XML is a meta-language (a language to describe languages) and not a language, RDF is a really general model (in fact, even more general than XML) and "converting to RDF" doesn't really mean much.

Each of the RDFizers tries to be as specific as possible in identifying the semantics associated with the data that is being converted. For data formats that are already highly structured (like EXIF information in digital pictures or MIME headers in Email messages or method references in Java bytecodes), this is possible without human intervention (the RDFizer just has to decide what ontology to use to express that information in RDF).

But for data formats where structure exists but the "semantics" associated with relationships are implicit and cannot be explicitly decoded from the data that we are converting (for example, while converting a relational database or some valid XML data), user intervention (directly or via dataset-specific configurations) will be required to fill in the gaps.

It's way out of the scope of the SIMILE project to work on natural language processing techniques or any other heuristics to perform data RDFization, also because we believe that there is a lot of data out there that can be RDFized without any heuristics and it's just a matter of enabling users to annotate existing structure in a more semantic way, hopefully reusing existing ontologies instead of creating new ad-hoc ones.

Ok, I'm interested, what RDFizers are there?

Here is the list of the RDFizers that we have built so far:

  • JPEG -> RDF - scans a folder for JPEG files, parses the EXIF and IPCT metadata found in those files and dumps an RDF/N3 representation of it into a file.
  • MARC/MODS -> RDF - transforms MARC records from Z39.2 format into MODS and then from MODS to an RDF representation of MODS.
  • OAI-PMH -> RDF - harvests an OAI-PMH repository and transforms the captured metadata in an RDF representation thru pluggable XSLT stylesheets.
  • OCW -> RDF - harvests metadata from the MIT OpenCourseWare web site and transforms it into an RDF representation of IEEE LOM.
  • EMail -> RDF - transforms email mbox files into RDF/XML.
  • BibTEX -> RDF - transforms BibTEX files into RDF/XML.
  • POM -> RDF - transforms Maven POM (Project Object Model) files into RDF/N3.
  • DEB -> RDF - extracts the metadata from a Debian software package and generates an RDF representation.
  • CRW -> RDF - extracts the metadata from a Canon RAW image file and generates an RDF representation.
  • Flat -> RDF - converts classic unix text database files, like /etc/passwd, into RDF/N3.
  • Weather -> RDF - emits a single RDF/N3 sentence with the weather report for a given zip code or city in the US.
  • Java -> RDF - scans java bytecode for method calls and creates a description of the dependencies between classes and the package/archive encoded in RDF/N3.
  • Javadoc -> RDF - a doclet that makes javadoc output metadata about your code (structure of the classes, methods, comments, etc.) encoded in RDF/N3.
  • Jira -> RDF - transforms Atlassian Jira's events about bug reports and issue tracking into RDF/N3.
  • Subversion -> RDF - A pair of scripts; one can be used in a post-commit subversion hook to generate RDF/N3 with each commit, the other on a working copy.
  • Random -> RDF - generates synthetic random graphs encoded in RDF/N3.

What RDFizers exist out there, besides the SIMILE's ones?

Here is a list of RDFizers that other people wrote outside the SIMILE project:

  • From W3C/MIT labs:
    • Address Book
    • Calendar
    • Email
    • GPS
      • Garmin -> RDF - converts GPS data from Garmin receivers into RDF.
    • Pictures
      • EXIF -> RDF - converts the EXIF picture metadata in RDF.
    • Software Dependencies
      • Fink -> RDF - converts the software dependencies between Fink packages in RDF.
  • From Aduna and DFKI:
    • Aperture - Java framework that converts excel, word, pdf, jpg, openoffice, quattro, ... files to RDF (mainly plaintext extraction). Crawlers for websites, IMAP emails, filesystes, Flickr, del.icio.us, Mozilla Thunderbird Address book, vCalendar Files, ... . Active project, used by system integrators.
  • From Monrai Technologies:
    • Natural Language -> RDF - Monrai Cypher: software that converts natural language statements and phrases into RDF triples and SPARQL queries
  • From Openlink Software:
    • WWW Content -> RDF - Virtuoso Sponger: converts a wide variety of non-RDF web data sources into RDF, including (X)HTML webpages, microformats, web services (e.g. Google API, Flickr), and binary file types (e.g. MS Office, PDF, etc)
  • From Freie Universitat Berlin's labs:
    • D2RQ - treats Non-RDF relational databases as virtual RDF graphs.
    • D2RMAP - "database to RDF" mapping language and processor.
  • From University of Maryland - Mindswap's labs:
    • XLS -> RDF - converts Microsoft Excel spreadsheets into RDF
    • CSV -> RDF - conversts "comma-separated values" files into RDF.
  • From the UMBC ebiquity lab:
    • RDF123 - convert spreadsheets to RDF
  • From the Rhizomik initiative:
    • XSD -> OWL - maps XML Schema constructs to OWL ones trying to capture the XSD implicit semantics.
    • XML -> RDF - transform XML metadata to RDF taking into account previously the XMLSchema to OWL mappings extracted with the RDFizer above.
    • MPEG-7/CS -> OWL - transform an MPEG-7 Classification Scheme to an OWL ontology.
  • From Peter Jeszenszky:
  • From Wolf Siberski:
  • From Dave Beckett:
    • Flickr -> RDF - Flickcurl: C library for the Flickr Web Service API that exports data in RDF.


If you know of any other tool that does RDF transformation that is not listed here, feel free to add it yourself.

See also