Re: Sesame C-Store Sail

From: Ian Wilson <>
Date: Tue, 14 Feb 2006 16:19:35 -0700

Stefano Mazzocchi said the following on 2/14/2006 8:42 AM:
> Ian Wilson wrote:
>> Hi,
>> I spoke with Eric Miller briefly a few weeks ago (w3c life sciences
>> sig) about how a C-Store Sail for Sesame would be interesting. I
>> stated that I had brought up C-Store in a previous message on the
>> Simile list, but no one seemed interested.
>> Eric said there was already a project being discussed at Simile to do
>> this
> ??
> not that I'm aware of. I think you misunderstood: we are supposed to
> test and hopefully help in the scalability of existing triple stores,
> but there is no plan to work on something like this.

:-) I guess not. I cc'd Eric - since I probably misunderstood
his comment.

>> , and to bring the topic up again since it probably got overlooked.
>> So, that's the purpose of this message. :-)
> I did in fact missed that. Can you tell us more?

I just mentioned the c-store project in a previous message,
nothing much beyond that.

>> For those that have followed Google's BigTable work (Google Base),
>> C-Store is similar to this. Basically, storage of data by column
>> rather than by row.
> Interesting, any papers to point us at?

C-Store had a paper at VLDB:
website here:

Not much has been released on BigTable, or no papers that I am
aware of, but it is the code behind GoogleBase. The model is
similar to RDF. There was a nice video lecture on the topic by
Jeff Dean:

BigTable: A Distributed Structured Storage System

"BigTable is a system for storing and managing very large
amounts of structured data. Data is organized into tables with
rows and columns, but unlike a traditional database system,
the row/column space can be sparse. Row keys and values are
arbitrary strings, and the system allows each row/column cell
to store not just a single value but a set of values with
associated timestamps, simplifying analyses that examine how
values have changed over time. Data in a single table is
internally broken at arbitrary row boundaries to form
contiguous regions of data called tablets. These tablets are
distributed across a large pool of worker machines. The system
is designed to manage several petabytes of data distributed
across thousands of machines, with very high update and read
request rates coming from thousands of simultaneous clients.

In this talk, I'll discuss the basic design of BigTable and
its implementation, provide some performance measurements, and
outline some current applications of the system. I'll also
touch on our future goals and directions for the system."

>> I'm be curious to know if you plan to implement this(?).
> Me too :-)


Received on Tue Feb 14 2006 - 23:18:38 EST

This archive was generated by hypermail 2.3.0 : Thu Aug 09 2012 - 16:39:18 EDT