What "Data Gravity" Means to Your Data

If you've wondered why so many companies are eager to control data storage, the answer can be summed up in a simple term: data gravity. Ultimately, where data is determines where the money is. Services and applications are nothing without it.

Data gravity is a term coined in a blog post by Dave McCrory. Basically, McCrory says to consider data as if it were an object:

As Data accumulates (builds mass) there is a greater likelihood that additional Services and Applications will be attracted to this data. This is the same effect Gravity has on objects around a planet. As the mass or density increases, so does the strength of gravitational pull. As things get closer to the mass, they accelerate toward the mass at an increasingly faster velocity.

...Services and applications can have their own gravity, but data is the most massive and dense, therefore it has the most gravity. Data if large enough can be virtually impossible to move.

Later, McCrory's post went on to talk about artificial influences on data gravity, such as costs, data throttling, legislation and more. Basically, factors that influence the movement of data in ways that wouldn't happen "naturally." For instance, Amazon allows free inbound data transfer, but charges for outbound data transfer. Another "artificial" influence is legislation, telling companies where they may or may not store data, or dictating terms of its storage.

Data Gravity in Action

You don't have to look very far to see data gravity in action. Consider Dropbox, Amazon S3, iTunes or just about any CMS migration ever.

Lots of companies want to emulate Dropbox, but few have managed to attract the same kind of user base as Dropbox. None are as ubiquitous as Dropbox. And that presence is paying off for Dropbox, which has now attracted quite a few third-party apps to its orbit, like Wappwolf and Ifttt. Perhaps that's why Apple is trying to disrupt Dropbox's gravitational pull and rejecting some iOS apps that use Dropbox.

You'll note that Amazon S3 and other Amazon AWS services make it very easy to get data in, but getting data out gets spendy. No shocker here - Amazon wants to encourage as many developers and companies to toss data into AWS, and then tie them to the service.

Apple's iTunes is all about keeping data in Apple's services. Aside from Apple's now-defunct DRM on music, there's no using iTunes to transfer music or movies to other devices. It's Apple devices or nothing. Getting the entire library out of iTunes is non-trivial for many users, so in many cases it's like a digital roach motel: data checks in, but it doesn't check out.

If you've ever worked with content management systems, you already know all about the concept data gravity - even if you've never heard the term. Getting all the data out of one CMS to another is, well, painful at best. Often impossible. This is one reason why companies often stick with aging CMSes rather than go through the pain of migration. 

Consider Gravity Before Deploying

Whether it's a single-user application like iTunes, or a company wide project: You need to consider the implications of data gravity - once your data is in, how hard will it be to break the gravitational field?

The stronger the data gravity involved, the more cautious you should be when you choose your data storage solution. It's likely that once you have a sufficient amount of data wrapped up in a solution, it's going to be very difficult (if not impossible) to justify the costs of moving it away.

(Lead image courtesy of Flickr User Juan Ramon Rodriguez Sosa under the Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) license.)