NoSQL databases are well-known for their speed and scalability – useful traits when dealing with the size and complexity of big data and hyper-fast transaction requirements. But one thing they have lacked has been strong data consistency: the ability to ensure that an update to data in one part of the database is immediately propagated to all other parts of the database.
A startup database vendor launched this week is making claims that its database, FoundationDB, finally delivers on the promise of true data consistency for a NoSQL database, without a huge loss of speed or flexibility.
Understanding why this is such a big deal in the Big Data (or any) sector requires a little background on how NoSQL, or non-relational, databases work.
Solving The ACID Test
When talking about relational databases, like PostgreSQL, MariaDB, Oracle and the like, there’s one acronym that keeps coming up: ACID. ACID stands for Atomic, Consistent, Isolated and Durable – core aspects that must apply to all data within a relational database. Data is broken down to atomic values (name, address_1, city…) while remaining consistent across the database, isolated from other transactions until the current transaction is finished, and durable in the sense that the data should never be lost.
The infrastructure of a relational database is well-suited to meet the ACID criteria for data: Data is held in tables connected by relational algebra, and transactions are performed in a way that is consistent with ACID principles.
But for non-relational databases, such as Bigtable, MongoDB or Dynamo, ACID has always been sacrificed for other qualities, like speed and scalability.
This tends to freak out some companies, stopping them from moving to NoSQL because they can’t give up ACID. Especially the “C,” because not having data consistency is a particularly terrifying prospect for companies dealing with financial transactions.
Yet non-relational databases are being used by firms like Amazon and Google every day, with great success. Amazon, in particular, needs to track millions of sales transaction on any given day – how does it get away with inconsistent data?
The short answer is, it has to. The trade-off would be a relational database that could never keep up with the speed and scaling necessary to make a company like Amazon work as it does now. Recall that non-relational databases are structured to sacrifice some aspect of ACID to gain something in return. In the case of Amazon, its non-relational DynamoDB database is willing to apply an “eventually consistent” approach to the data in order to gain speed and uptime for the system when a database server somewhere goes down (though Dynamo can also have strong consistency, an Amazon spokesperson informed us after this story went to press).
Bringing Back Consistency
It’s not that having ACID compliance on a NoSQL database is impossible, explained David Rosenthal, one of FoundationDB’s co-founders. It’s just that most people think that applying ACID to NoSQL systems would come at a huge cost.
That’s certainly what Werner Vogels, CTO of Amazon, thought in a 2008 paper that described the company’s Dynamo database and it’s relationship to consistency.
Data inconsistency in large-scale reliable distributed systems has to be tolerated for two reasons: improving read and write performance under highly concurrent conditions; and handling partition cases where a majority model would render part of the system unavailable even though the nodes are up and running.
Translation: Requiring ACID on non-relational databases would make that database too slow and inflexible.
For the longest time, everyone using NoSQL systems was resigned to this eventual, or “weak,” consistency model. After all, they had money to make and data to analyze. Who cares if consistency was not at the top of the priority list?
It turns out, quite a few people, including the founders of FoundationDB, Rosenthal, Nick Lavezzo and Dave Scherer.
Inside FoundationDB
After a successful start up with Visual Sciences, a technology that’s now part of Adobe as the Adobe Insight product, the trio turned to developing another successful project, and hit on the lack of ACID-capable non-relational databases as a goal.
“We weren’t satisfied with any of the data guarantees on non-relational systems,” Rosenthal explained, even as they understood that the needs of many potential clients would preclude relational systems like MySQL or Oracle because of performance limitations.
Non-relational systems seemed to wear their weak consistency model like a badge of honor, but in the secret origin story of FoundationDB, the team saw weak consistency as a bug, not a feature. “Not having transactional integrity is not a good thing,” Rosenthal emphasized.
They’re not the only ones. Google’s up-and-coming Spanner database, a second-generation distributed database that could ultimately replace the search engine company’s Bigtable systems, is being built on the premise that transactional integrity has to be a part of that database, too.
Side Effects Include…
Establishing consistency in transactions within a NoSQL database is worthy news in itself, but the implications extend beyond that core news.
FoundationDB uses a key-value-like storage engine core that’s surrounded by layers of whatever data model that’s needed, which will in turn enable developers to much more easily code their apps to reach into the FoundationDB. These layers, according to the founders, can’t be used on other key-value systems, because without consistent transactions, it would not work.
Also, since data is going to be consistent, applications won’t have to be built to “wait” for data to catch up within a given transaction – thus making apps less complex and easier to build.
The best news of all concerns the so-called performance penalty that many in the NoSQL world said will be incurred if ACID was applied to non-relational database systems. According to FoundationDB, performance is hampered by only 10%, which seems a very small price to pay for consistent transactions.
The FoundationDB database, which was launched into public beta on Monday, is available for download now.
Image courtesy of FoundationDB.