Do you know Riak? a decentralized, internet-scale database

October 13, 2010

Fast, Reliable, and Scale

Probably, many people know Cassandra, HBase, CouchDB, and MongoDB. But, It is not that popular for people to know Riak, a relatively new NonSQL database.

The reason for RIAK catches our attention is that testpilot of Mozilla labs using the RIAK. What RIAK can do for Mozilla labs? Source

1. Expected minimum users: 1 million. Design to accommodate 10 million by the end of the year and have a plan for scaling out to tens of millions. (This is the 1x 10x 100x rule of estimation of which I am a fan)
2. Expected amount of data stored per experiment: 1.2 TB
3. Expected peak traffic: approximately 75 GB per hour for two 8 hour periods following the conclusion of an experiment window. This two day period will result in collection of approximately 90% of the total data.
Remain highly available under load
4. Provide necessary validation and security constraints to prevent bad data from polluting the experiment or damaging the application
5. Provide a flexible and easy-to-use way for data analysts to explore the data. While all of these guys are great with statistics and thinking about data, not all of them have a programming background, so higher-level APIs are a plus.
6. Do it fast.

After comparing HBase, Cassandra, and Riak very carefully, they chose RIAK. That’s the simple answer. There is also another open source app called LUWAK dedicated for reading/writing large size of blocks to RIAK.

