Amazon Takes Another Pass at NoSQL with DynamoDB

Amazon’s Dynamo paper (PDF) is the paper that launched a thousand NoSQL databases, if you’ll pardon a twisted metaphor and wee bit of exaggeration. The paper inspired, at least in part, Apache Cassandra, Voldemort, Riak and other projects. Now Amazon is making its own take on Dynamo, melded with SimpleDB, available for Amazon Web Services (AWS) customers.

Amazon CTO Werner Vogels wrote about the new service this morning on his blog, saying that Amazon DynamoDB is “the result of 15 years of learning in the areas of large scale non-relational databases and cloud services.”

History

It’s no secret that Amazon handles massive traffic, and quite gracefully for the most part. A few AWS outages notwithstanding, Amazon.com itself handles massive traffic with very few visible outages. It wasn’t always so, and Vogels says that the holiday season outages in 2004 could be traced back to “scaling commercial technologies beyond their boundaries.”

So Amazon started developing ” a collection of storage and database technologies to address the demanding scalability and reliability requirements of the Amazon.com ecommerce platform.” Part of that was Dynamo, “a highly reliable, ultra-scalable key/value database.”

But Amazon only saw a lot of uptake with Dynamo in its core services. Vogels says that when talking to other service owners in Amazon, they were dissatisfied with the complexity of Dynamo and having to run it themselves.

While Dynamo gave them a system that met their reliability, performance, and scalability needs, it did nothing to reduce the operational complexity of running large database systems. Since they were responsible for running their own Dynamo installations, they had to become experts on the various components running in multiple data centers. Also, they needed to make complex tradeoff decisions between consistency, performance, and reliability. This operational complexity was a barrier that kept them from adopting Dynamo.

Amazon also started working on SimpleDB for AWS, and found that many of its developers liked SimpleDB. “Ultimately, developers wanted a service” says Vogels.

But SimpleDB had its limitations as well:

The 10GB limit for datasets meant that developers had to work around the limitation if they wanted to use SimpleDB with more than 10GB of data.
SimpleDB’s approach to indexing means that its performance is not predictable.
The “eventual consistency” approach meant that there was a “consistency window” that could last up to 1 second before the database was fully updated.
Complex pricing based on Machine Hours.

So Amazon’s DynamoDB is meant to address the limitations of Dynamo and SimpleDB.

Details on DynamoDB

According to Vogels, “DynamoDB is based on the principles of Dynamo, a progenitor of NoSQL, and brings the power of the cloud to the NoSQL database world. It offers customers high-availability, reliability, and incremental scalability, with no limits on dataset size or request throughput for a given table.”

The current release of DynamoDB offers two types of keys for primary index querying: Simple Hash Keys and Composite Hash Key / Range keys. Simple uses the Distributed Hash Table abstraction described in the original Dynamo paper. The composite key allows developers to use a primary key that is a unique has key with a range of attributes. Vogels’ example, “all orders from Werner in the past 24 hours, all log entries from server 16 with clients IP addresses on subnet 192.168.1.0.”

DynamoDB will replicate data over 3 data centers at a minimum. It doesn’t require a fixed schema, and every data item can have any number of attributes. It’s meant to be fast, all data is stored on Solid State Drives, and it does not index all attributes. Vogels claims that “an application running in EC2 will typically see average service-side latencies in the single-digit millisecond range for a 1KB object.”

What’s more, latency is predictable says Vogels. “Even as datasets grow, latencies remain stable due to the distributed nature of DynamoDB’s data placement and request routing algorithms.” Vogels says that DynamoDB will have provisioned throughput, so customers can “specify the request throughput capacity they require for a given table.” This is a variable setting, so customers can increase or decrease the capacity on-demand using the management console or APIs.

Cost

What about pricing? One of the complaints about the SimpleDB pricing was complexity. I’m not sure that DynamoDB really solves that problem. Here’s how DynamoDB pricing breaks down:

Data storage is $1.00 per GB-month.
Data transfer is free for incoming data, and free up to 10TB per month and between AWS services. After that pricing is $0.12 per GB up through 40TB and pricing continues to drop through 350TB. If you need to transfer more than 524TB, contact Amazon for pricing.
Throughput capacity is charged at $0.01 per hour for every 10 units of write capacity, $0.01 per hour for every 50 units of read capacity.

That’s where it gets tricky. Amazon’s description says “a unit of Write Capacity enables you to perform one write per second for items of up to 1KB in size. Similarly, a unit of Read Capacity enables you to perform one strongly consistent read per second (or two eventually consistent reads per second) of items of up to 1KB in size. Larger items will require more capacity.”

Amazon also has a free tier that gives 100MB of free storage and 5 writes/second and 10 reads/second of throughput capacity.

The service is already being used by a number of high-profile services like IMDB, SmugMug, Elsevier, Tapjoy and Formspring. I’d be curious to hear with ReadWriteCloud readers think of DynamoDB, especially if you’re already using SimpleDB. Is this something you’re likely to switch to? If not, why not?