Things I've done last week:
1) start thinking seriously on how to make longwell scale to a size
where it becomes useful
2) collected the entire MIT library catalog in MARC21 (1.2 million
records)
3) started to write an RDFizer to transform all that data into RDF,
doing MARC21 -> MARCXML -> MODS/XML -> MODS/RDF/XML (the first 3 stages
are done).
4) started working on a MODS/XML -> MODS/RDF/XML XSLT transformer
5) started to work on how to scale Gadget using some sort of disk
index (to help achieving #5)
Things I plan to do this week (in this order due to dependencies):
a) finish converting the MIT MARC records in MODS (converted 300K so
far but ran into massive I/O disk slowdowns due to the large number of
files... need to rethink the disk storage strategy)
b) finish the work on Gadget so that I can generate the spectrum of
the MIT MODS dataset
c) finish a first draft of the MODStoRDF XSLT stylesheet and get a
sense of where the problems are.
--
Stefano Mazzocchi
Research Scientist Digital Libraries Research Group
Massachusetts Institute of Technology location: E25-131C
77 Massachusetts Ave telephone: +1 (617) 253-1096
Cambridge, MA 02139-4307 email: stefanom at mit . edu
-------------------------------------------------------------------
Received on Mon Jan 23 2006 - 17:06:33 EST