Hadoop is gaining more commercial acceptance. We see a number of signs of its growing popularity. It became abundantly clear in a recent conversation we had with a Yahoo! executive who says the company is rebuilding its future on the distributed warehousing and analytics technology.
It’s a similar track we are seeing with the larger consumer social networks and cloud computing providers. Facebook uses Hadoop to do deep social analytics which powers the ability to provide its established level of personal interaction. Windows Azure is adopting Hadoop.
In a recent call, Eric Baldeschwieler, vice president of Hadoop for Yahoo! said Hadoop is at the core of rebuilding Yahoo! for its future.
We followed up in email to ask him about what Hadoop brings to Yahoo’s future. Here’s his prepared statement:
“Yahoo’s vision is to be the center of people’s online lives, by delivering personally relevant Internet experiences. Think of Hadoop as a foundational layer beneath two of Yahoo!’s most precious assets: its user data and its diverse collection of content. For Yahoo!, data processing and analysis is the key to understanding its massive global audience, enriching products and connecting users with advertisers.
As Hadoop is increasingly becoming the data warehouse for Yahoo!, the company expects to accelerate the pace of innovation across all of its consumer and advertiser experiences.”
Yahoo! started using Hadoop initially in 2006 as a science project to process and analyze massive data sets. They developed a prototype on 20 nodes. Today, Yahoo! manages more than 25,000 nodes for data processing and analytics.
Yahoo! found that product development could be done in a fraction of the time. They found they could just throw machines at a project to do the processing. What once took 29 days could be accomplished in less than one.
As a result, Yahoo! began integrating Hadoop for all parts of its business. The company offloaded storage from the IT department and put the data in a cluster.
Today, Yahoo uses Hadoop for determining best advertising placement and for content optimization. For example, the company started testing how the optimization worked on the home page by serving up content relevant to the user. It worked. Yahoo! saw a 150 percent increase in user engagement measurements.
The next step is to use Hadoop for optimizing latency, a major issue for scaling data networks in the cloud.
Hadop is becoming the standard for data processing and analytics in social networks, genome projects and beyond. Some see that as proof it has gained commercial acceptance.
So now it’s on to the next big data dicing project. And what will that be?
Cassandra for one.
More to come.