IDC says that spending for cloud storage will triple by 2015. It had better, because the roster of companies with their hands out for some of those storage dollars keeps growing. The latest contender is Inktank, a service and support company formed by the creators of the Ceph open source storage project.
With all the cloud storage competition coming out of the woodwork, how does Ceph distinguish itself? It started as a doctoral project by Sage Weil, at UC Santa Cruz back in 2004. The question is whether Weil’s academic project has evolved enough to take a chunk of the storage market?
All About Ceph
Ross Turk, Inktank’s VP of community, says that Ceph is designed for “multiple data storage needs with a unified storage platform.” Some storage offerings, for example, provide object storage, but not block storage. Some solutions might provide object and block storage, but aren’t POSIX-compatible.
Turk says that Ceph does all of the above. “It provides object storage, similar to Amazon’s S3 and compatible with apps written for Amazon S3. It provides the kind of block device storage necessary for VM images, thinly provisioned and striped across the entire storage cluster. Finally, it provides a bottleneck-free, POSIX-compliant network filesystem. It does all of that on top of a single storage cluster, so it’s not necessary to have separate clusters for each different storage need.”
Just as importantly, Turk says that Ceph’s distributed capabilities make it a compelling option for companies looking to fill big storage needs. First, Ceph’s placement algorithm (“CRUSH”) “is far more intelligent than anything out there today,” Turk says. “It allows storage clients to calculate the location of data within the cluster, instead of having to look it up somewhere, and it does it while allowing for robust placement rules.”
CRUSH, says Turk, lets admins tell a Ceph storage system just how many replicas of data are required, and how they should be distributed “across the various nodes, racks, rows and rooms in their data center.”
The second part of Ceph’s distributed features is that it’s “self-managing” and “self-healing.” This means that Ceph’s daemons “constantly work with each other to balance data throughout the cluster as the conditions within the cluster change.”
This makes Ceph particularly attractive for companies that want to use commodity hardware, where failure is the norm not the exception. What type of companies does Turk see adopting Ceph? He says Inktank is seeing service providers, enterprises and organizations doing High Performance Computing (HPC). “We are also partnering with the Big Data community, various Linux distributions, and a number of platforms, including all of the Cloud stacks and hosting automation platforms.”
It’ll be interesting to see who Ceph’s Linux partners are. Red Hat is busy pushing GlusterFS since it acquired Gluster last year. Ceph is also competing with S3-compatible storage layers provided by open source IaaS stacks, like Eucalyptus’ Walrus and OpenStack’s Swift, at least in some use cases. There’s also RiakCS, which is a proprietary add-on to the open source Riak.
With all the money flowing into storage right now, there’s room for plenty of players – especially since they each have features that appeal to specific use cases. But it’s a lot of homework for IT managers looking to pick the right solution to standardize on. If you’re working with cloud deployments or putting together storage solutions, which options are you looking at?