Last month, we introduced you to a cloud-based backup system called CTERA - a practical demonstration of the flexibility of the underlying object storage platform. That platform, called CAStor, is essentially a mapping system for files stored over a widely distributed pool of clusters in the cloud. CAStor takes care of where things are located in the cloud; applications like CTERA map those locations using systems that make sense to humans.

Today, the company behind CAStor - Austin, Texas-based Caringo Inc. - altered the definition of "things" in that context, with the introduction to its customers of CAStor version 5.5. With the help of a little process learned from Web mechanics called chunked encoding, data centers will become able to store widely distributed chunks of files up to 4 TB in total length. It's part of Caringo's latest effort to squash RAID using the cloud as its weapon.

The idea of chunked encoding is for a system to begin storing an object as it streams in, rather than have it wait in a cache while its final size is ascertained. For a system that can accept a 4 TB single file, that's a lot of cache; and if the same system is to be used for objects that may also be infinitesimally small, a huge cache could actually work against you. So chunked encoding enables clusters to be provisioned while huge files are being uploaded. CAStor sorts it out once it receives the "zero byte" signaling everything's done.

Caringo's other upgrades for CAStor version 5.5 include a zero-provisioning system in the new Cluster Services Node (above) for setting up additional nodes in networks. Rather than manually install the CAStor software on every node in the network, the new system enables nodes added to the network to access and install the software for themselves.

These latest upgrades represent the latest effort by Caringo to reintroduce its customers to the notion of replication as a viable method for protecting data. Replication is a service that CAStor does perform, but which its end users never have to be concerned with. A July 2011 Caringo white paper on the subject (PDF available here) made the point that enterprises that came to depend on RAID1 architecture for redundant disk arrays started moving away from replication as costly and inefficient, perhaps believing that redundancy and replication together were... well, redundant.

The graph from that white paper demonstrates that CAStor can implement optimized per-object replication, as well as other resilience measures such as performance reserve (holding open a small amount of space for periodic defragmentation) and reserve storage pool space for "hot spares" when a RAID5/RAID6 rebuild becomes necessary (the right column, above), while consuming no more storage space than raw replication alone would have required under RAID1 (the left column).

"if you are using RAID 5 or 6 as a data resilience method it is only a matter of time before you experience data loss (if you haven't already)," the white paper concluded. "Replication will ensure the resilience and recoverability of your data for the full life cycle of your applications. The CAStor storage architecture delivers the value of replication without compromising capacity when compared to RAID alternatives. CAStor customer environments reap the benefits of future proofing of their storage investment spanning the entire life of the system for a superior TCO experience."

Caringo did concede at that time that RAID5/RAID6 configurations could be more efficient than CAStore with file systems where file size tended to be larger; the advantage swung back to CAStor's favor where file sizes were smaller. But that was before the implementation of chunked encoding for version 5.5, which could conceivably have swung the pendulum back in Caringo's direction.