Netflix Director of Cloud and Systems Infrastructure Yury Izrailevsky explains how and why Netflix migrated some of its systems to NoSQL. “In the distributed world governed by Eric Brewer’s CAP theorem , high availability (a.k.a. better customer experience) usually trumps strong consistency,” he writes. ” There is little room for vertical scalability or single points of failure.”
Netflix uses three NoSQL tools: SimpleDB, HBase and Cassandra. “The reason why we use multiple NoSQL solutions is because each one is best suited for a specific set of use cases,” Izrailevsky writes. He writes that the learning curve has been steep and re-architecting the company’s systems has been difficult but “the scalability, availability and performance advantages of the NoSQL persistence model are evident and are paying for themselves already, and will be central to our long-term cloud strategy.”
SimpleDB
It’s nice to see some real-world use of SimpleDB. In Izrailevsky’s words:
Amazon SimpleDB was a natural choice for a number of our use cases as we moved into AWS cloud. SimpleDB is highly durable, with writes automatically replicated across availability zones within a region. It also features some really handy query and data format features beyond a simple key/value interface, such as multiple attributes per row key, batch operations, consistent reads, etc.
Netflix’s Siddharth “Sid” Anand published a white paper on the company’s use of SimpleDB, as well as a blog post on the subject.
HBase
Netflix uses HBase because it’s deeply integrated with Hadoop. Izrailevsky writes that the biggest advantage in using HBase is the ability to “combine real-time HBase queries with batch map-reduce Hadoop jobs, using HDFS as a shared storage platform.” He notes, however, that with HBase the company does have to sacrifise some availability for consistency.
Cassandra
Netflix uses Cassandra for its scalability and lack of single points of failure and for cross-regional deployments. ” In effect, a single global Cassandra cluster can simultaneously service applications and asynchronously replicate data across multiple geographic locations.”
Izrailevsky writes:
Unlike a distributed database solution using e.g. MySQL or even SimpleDB, Cassandra (like HBase) can scale horizontally and dynamically by adding more servers, without the need to re-shard – or reboot, for that matter. In fact, Cassandra seeks to avoid vertical scalability limits and bottlenecks of any sort: there are no dedicated name nodes (all cluster nodes can serve as such), no practical architectural limitations on data sizes, row/column counts, etc.
He also makes special mention of Cassandra’s data model, which offers flexible model representations beyond the typical key-value lookup model.
More
If you crave more information about Netflix and NoSQL, check out Netflix Cloud Architect Adrian Cockcroft’s series of blog posts on NoSQL and Anand ‘s blog.
You might also be interested in learning how Twitter uses NoSQL.
Also, this talk covers when you shouldn’t use a non-relational database.