Whether the acronym “NoSQL” stands for “not only SQL,” as some database architects content, or literally “no SQL,” up until this month, it has been taken to imply “no Oracle.” One of the many hallmarks of Oracle’s SQL RDBMS technology, historically, has been consistency — the notion that every client perceives the same view of the data at any one time. Maintaining consistency, among other factors, incurs latency issues as database sizes scale with social media into the stratosphere.
NoSQL databases scale up, but typically at the expense of consistency, which is something you wouldn’t think Oracle would want to give up.
This morning, Oracle lifted the veil on its first branded NoSQL software product, whose existence was inferred (albeit with blaring klaxons) three weeks ago at the company’s OpenWorld conference. As the company’s vice president for database development, Marie-Anne Neimat, told RWW this morning, one of Oracle NoSQL’s key differentiators will concern how it mitigates the consistency problem: by enabling users to scale the tradeoff.
“It’s a distributed key/value store with built-in high availability, and it also offers users options as to the level of consistency that they want,” Neimat tells us. “Consistency is made somewhat easier because the user can choose the level of consistency he or she wants, and can therefore trade rigid consistency for latency. One can be totally consistent at the potential cost of high latency, or one can say, ‘I’m willing to compromise on consistency, but I’ll have very fast response times.'”
A consistent message on consistency
Queries will be able to request the master copy of data when necessary, says Neimat, or alternately the most readily available, or “closest,” copy. When a database administrator requests rigid, or “absolute consistency,” the system will update all copies of the data prior to returning any data to the user at all. In-between, the user can request what Neimat calls “majority consistency,” or quicker still, “single copy consistency,” where there’s no guarantee the returned image will match any copy on any other partition. In such cases, other copies of the data will be updated asynchronously.
As a white paper released this morning concedes (PDF available here), an application can be programmed to return immediately after a write process has concluded, even though there’s no evidence at that point that the written data was made persistent (backed up throughout the system). “Such a policy provides the best write performance, but provides no durability guarantees,” the white paper says. “By specifying when the database writes records to disk and what fraction of the copies of the record must be persistent (none, all, or a simple majority), applications can enforce a wide range of durability policies.
Last January, Google’s approach to the consistency/latency tradeoff was the publication of its own approach, entitled Megastore (PDF available here). It’s characterized by an algorithmic method, called Paxos, for distributing vast amounts of data among multiple nodes while maintaining a relatively acceptable degree of consistency, and giving developers visibility into potential performance costs in advance of such activities as inner joins and schema changes.
Today, Oracle’s white paper takes a quintessentially Oracle tack, saying not only did it embrace NoSQL, but it effectively invented NoSQL by having acquired Innobase in October 2005. Innobase was the originator of the Berkeley DB key/value store currently used in Oracle’s embedded databases; and this morning, Oracle admitted that Berkeley DB lies at the heart of its new NoSQL.
“Although some of the early NoSQL solutions built their systems atop existing relational database engines, they quickly realized that such systems were designed for SQL-based access patterns and latency demands that are quite different from those of NoSQL systems, so these same organizations began to develop brand new storage layers,” reads this morning’s white paper. “In contrast, Oracle’s Berkeley DB product line was the original key/value store; Oracle Berkeley DB Java Edition has been in commercial use for over eight years. By using Oracle Berkeley DB Java Edition as the underlying storage engine beneath a NoSQL system, Oracle brings enterprise robustness and stability to the NoSQL landscape.”
Propagation and migration
As Marie-Anne Neimat made clear to us this morning, the “No” in “Oracle NoSQL” truly does mean no. “The data is sharded around many nodes. For each partition, there is a master and several copies. The updates always go to the master copy, and are then propagated to the other copies, either synchronously or asynchronously,” she explains. “But it’s not an Oracle-style database; it’s a key/value store.”
In Oracle’s scheme, like with other NoSQL databases, the data is distributed over multiple storage nodes in the network by way of hashing algorithms. For an application to get or put data, it utilizes a client library, which in this case is a Java API. The library is aware of the hashing algorithm, which helps it to determine the identity of the node where data resides.
Now that Oracle is in the business of producing both principal types of database managers, which is best suited for SQL and which for NoSQL? “Our view at the moment is that NoSQL’s main advantage is the flexibility of the schema,” responds Neimat. “With a relational database and SQL, you pretty much have to decide the tables and the attributes you will have. And yes, you may evolve them over time, but it’s still a big deal to evolve a relational schema. But with a NoSQL database, you have the advantage that the schema is in the eyes of the developer, and can evolve over time. It does put some of the burden of the knowledge of the schema on the application writers, so it’s a little harder to write applications. On the other hand, you have the benefit of flexibility.
“As things settle down,” she continues, “and one knows exactly what the schema should be, one can envision migrating to a SQL database… What Oracle wants to do is provide our customers with all means of managing data, and all means of migrating data from one model to the other to make life easier, and to provide them with the best enterprise-level support.
Many of the leading open source NoSQL models available today, including Neo4j and Infinite Graph, describe themselves as graph databases. They use key/value pairs as well, but data nodes are related to other data nodes by way of properties or attributes, the way a predicate modifies a subject by way of an adverb. At least for now, Neimat tells us, Oracle NoSQL should not be considered a graph database. Its storage model should instead be based around columns, the records within which may be related by way of subkeys. Here, records that are closely associated with one another are stored and indexed in the immediate vicinity of one another to ensure fast lookup times. But records that are more incidentally associated with one another are joined by minor keys, defined by the program but not necessarily indexed together.
Here is where the SQL concept of the record (a row in a table) is broken down, and you have to be careful to distinguish associations (as I did above) from relationships. The principal association between Oracle NoSQL records is established by what’s called a major key path (for example, “Brian Fuller,” “Scott Fulton,” “Henry Fulton”) and the subordinate associations by minor key paths (Brian’s address, Scott’s address, Henry’s address). What SQL would call a “second-order relation” might be associated using NoSQL by a minor key path.
Down the road, an admin may have the bright idea of associating the names columns with the filenames of their online avatar’s (Brian’s mug shot, Scott’s mug shot, Henry’s mug shot). This can be done without reinventing the schema, Neimat explains. “The subkey model is a very convenient way to model an arbitrary number of columns, and different records for different columns, of adding attributes in an ad hoc manner and not having to plan the schema ahead of time.”
Can Oracle still claim “atomicity?”
So it might amaze some to discover that Oracle is promising atomic transactions, or “ACID capability,” for all write and update functions despite the fact that consistency is now a variable. Atomic transactions will apply to records that share the same main key, says Neimat, but not necessarily the same subkeys.
For the moment, she adds, Oracle NoSQL will not have direct support the Oracle Public Cloud, meaning it won’t be aware of the exclusive characteristics of the storage nodes where its databases are hosted. A future release may add data center-specific features, such as the ability for the admin to specify a recovery data center for explicit backups and recoveries, and also to outline some of the geography of a local data center versus a remote or cloud-based center. That geography may be helpful in establishing a proper order for updates to be propagated asynchronously among nodes — for example, local nodes first.
We can now state with absolute certainty that NoSQL is no longer the alternative to major-brand database management systems. The question now is whether customers will begin perceiving Oracle as the stronger alternative in the NoSQL realm.