The Compact Muon Solenoid Experiment (CMS) at CERN (The European Organization for Nuclear Research) will deploy the NoSQL database CouchDB into production this summer, CouchDB corporate sponsor Couchio announced today.
CMS’s Data Management and Workflow Management (DMWM) project has been testing NoSQL solutions for the past year. Simon Metson, convener of the DMWM group, gave a few reasons for the group’s decision to adopt CouchDB. DMWM’s experience may be useful for organizations considering NoSQL solutions.
View of the CMS endcap through the barrel sections, from Wikipedia
CMS is Creating Huge Amounts of Data
CMS is one of two general purpose particle physics detectors running on the Large Hadron Collider. “Approximately 3,600 people from 183 scientific institutes, representing 38 countries form the CMS collaboration who built and now operate the detector,” according to Wikipedia.
CMS will collect roughly 10PB of data per year. “We have a small number of users, but an amount of data similar to Facebook’s,” Metson says. DMWM needed a solution that could handle large amounts of data, often without metadata, quickly in a distributed environment in which incoming database connections are frequently impossible.
NoSQL solutions are designed to handle large number numbers of transactions. CouchDB, for instance, has been used to power the web based IM client Meebo, proving it can handle a rapid influx of data. CouchDB is also specifically designed for distributed environments.
No Need to Manage a Complex Replication Infrastructure
CouchDB is noted for its replication features. “We will have CouchDB instances at other labs and will be replicating data between sites,” says Metson. “We don’t have to write or manage complicated replication infrastructure.”
It Works Well With Other Systems
Metson says CouchDB works well with Oracle, which the organization uses extensively. “The architecture of CouchDB maps on well to our other tools,” he says “It has a nice way of working with a big database in a distributed environment.”
The Learning Curve is Shallow
Metson says the learning curve for CouchDB is quite shallow. In fact, DMWM had a summer intern with some programming experience, but no experience in CouchDB, who was able to build a simple data quality evaluation application using the database. The only drawback, Metson says, is that you may need to do a lot of unlearning if you’re well versed in SQL. “The more you know Oracle, the harder it is to pick up,” he says.
But the unlearning is worth it to DMWM, where the need to rapidly create application without writing and maintaining huge amounts of code is tantamount.
Learn More About NoSQL
Update: Couchio’s CERN case study is now available here.