Big services demand NoSQL, right? With nearly a billion notes and almost 2 billion resource files, Evernote should be ready to jump on the NoSQL and Big Data bandwagon, right? Not so fast, says Evernote's CTO Dave Engberg. According to Engberg, some applications may benefit from modern key-value storage engines, but Evernote has good reasons for sticking with its MySQL setup for account metadata.
In a post yesterday on the Evernote Tech Blog, Engberg says that the ACID-compliance of MySQL's default storage engine (InnoDB) is key to their synchronization model (PDF).
ACID compliance, says Engberg, ensures that the Evernote client on your desktop or mobile device can trust the reply given by the Evernote server. Atomicity means that user notes are stored accurately on the server, with all changes completed. If an API call fails, no changes are committed at all, says Engberg. "This means that if we fail trying to store the fourth image in your Note, there isn't a half-formed Note in your account and incorrect monthly upload allowance calculations to charge you for the broken upload."
Consistency means that notebooks won't be deleted with "dangling" notes. Durability means that when the server reports that a notebook has been created it really has been.
Engberg says Durability is the most important property. "If the client can't assume that changes made on the server will be Durable, then the protocol would become much more complex and inefficient. Each synchronizing client would need to constantly double-check whether the state of each server object matched the local state. Maintaining absolute consistency for an account with 20k Notes, 40k Resources, and 10k Tags would be very expensive if changes couldn't assume Durability."
The flip side, and why many services are looking to key-value data stores, is that scaling data sets can be pretty hairy. Engberg says that this is a problem Evernote has avoided by partitioning its data into 20 million data sets, "one per user."
Evernote has published a digest of its architecture if you're curious. It's a detailed (if somewhat outdated, this was from May 2011) look at how Evernote's service is structured. MySQL (running on top of Debian, in a Xen VM) holds user metadata, and file data is stored in the Linux file system.
Instead of big data, Evernote has "a lot of 'medium data' storage problems that partition neatly into a sharded architecture" says Engberg.
Evernote may be looking at newer tools for other projects that don't require the same ACID compliance, though. Engberg notes that Evernote's reporting and analytics system "has gradually outgrown its current MySQL platform" and is likely to be replaced. But for the user metadata, part of the core of Evernote's service? "We're relatively satisfied with sharded MySQL storage for Evernote user account metadata, even though that's not going to win any style points from the cool kids."