In 2013, traditional data centers will begin to lose their dominant status within the data-management food chain. They will increasingly be replaced by big-data software and lower-cost, ARM-based systems-on-chips.
When thinking about the future of data centers, the problem is one of scale. For the past few decades, relational databases and the attendant hardware that runs them have been able to manage pretty much anything a company could throw at them, but those days are coming to an end.
When Relational Ruled The Land
In the beginning, and for the first 20 years or so, data was heavily transactional, and was managed in discrete and very secureways. Speed was less important than making sure the data was safe as houses.
In the 1990s, data began to be used in a slightly different way, as comapnies placed analytical demands on the data being gathered. Instead of being retreived in discrete packages, data became as a strategic asset to be analyzed, leading to the disciplines of business intelligence. Databases grew into massive data warehouses, and parallel querying arose as the only way to effectively manage the staggering workloads placed on information technology.
Through the early years of electronic data, growth in the volume of data may have been rapid, but data tools and infrastructure were pretty much able to keep pace.
That’s not so true anymore. Software soon will not be able to cope with the overwhelming volume of data being generated, says Mike Hoskins, chief technology officer of Pervasive Software. What’s coming is a real break in how data is managed.
Breaking The Old Model
To give an idea of what kind of scale we’re talking about, Hoskins points to U.S. retailer Wal-Mart‘s estimated 1-petabyte data store.
“That’s the accumulation of 40 years of Wal-Mart sized business,” he said. “Facebook? Facebook generates that much data in a week.”
There’s always a collection of data behind each transaction. But in e-commerce today, a customer can be clicking around quite a bit before buying, which leads to useful data sets tens, hundreds or thousands of times larger than “so-and-so bought widget X with credit card Y.” Add the fact that the machines handling these activities are also recording machine-to-machine transactions, and the data workload explodes beyond the capacity of any traditional data center.
“We are reaching the end of the useful life” of our data centers, Hoskins said. “The bottom line is, it’s a death march.”
Even if conventional software could manage this explosion, no company could afford it. Not to mention the energy costs invovled in buying, running and cooling the hardware.
Indeed, it is innovation in hardware that’s going to provide the evolutionary break that Big Data requires. Servers with ARM-based processors, which absorb something like 20 times less power than Intel-based processors, are the next wave in data center infrastructure. Lower power requirements, after all, mean less resistance and less heat. Less heat means less money wasted on cooling and the ability to compress ARM-based systems closer together.
As energy and general hardware costs coem down, hardware is lined up to take care of the new data workloads of this new massive scale of data.
First Hardware – Then Software
On the software side, Big Data will increasingly be handled by Hadoop systems that can store data and manage and analyze Facebook-scale loads.
If you’re wondering why this is supposed to be big news, think about it this way: Relational databases have been handling data of all shapes and sizes for decades, and now there will be a certain level of data that the traditional data center architecture will simply be unable to handle. It’s the first stratification of data management.
On one level of data management, relational databases will still be around, supporting smaller, less complex and more tactical workloads. But on this new level, whole new architectures will be created to deal with this scale.
Big Data in the form of Hadoop-based architectures is but the first step into the future. In the past, data managers had to heavily pre-process data to get it to fit within a certain schema for use in a relational database. Today, they’re foregoing the pre-processing and are shoving the unformatted data into commodity Hadoop clusters. To perform analytical work, data managers are pulling refined data back into databases and other analytical tools.
What’s The Data Center Endgame?
This half-way approach is not the end game, though.
Eventually, Hoskins believes, tools will be built into the Hadoop framework that will enable data managers to run applications and analysis right where the data lives, inside the Hadoop clusters.
It’s no accident then that the latest iteration of one of Hadoop’s core components – MapReduce 2.0, code-named YARN – includes the beginnings of a framework that will let developers build exactly those kinds of tools inside Hadoop. This is something that the VP of Apache Hadoop Arun Murthy confirmed to me early this year at the Strata Conference in Santa Clara, California. When the YARN application framework is robust enough, Hadoop will be able to let developers code those applications.
This will be the new way of working with data as it gets too big for relational databases to handle: a new architecture of low-cost, low-power servers that will keep applications and data as close to each other as possible, in order to maximize efficiency and speed.
“Relational database technology has had a good run,” Hoskins said. But the days of the relational database being a part of every data solution are fading fast, as a new kind of data center becomes the new sheriff in town.
Image courtesy of Shutterstock.