Home Microsoft Completes Journey To Big Data Through Hadoop

Microsoft Completes Journey To Big Data Through Hadoop

There’s no beating around this bush. Today Hortonworks announced a new beta version of its Hadoop Data Platform that will run on Microsoft Windows Server, a move that shows Microsoft’s own Big Data efforts will forever be connected to open source innovation. This is a highly significant – even expected – move in the big data sector, but also a very strange one.

Hadoop, of course, is an open-source software architecture that supports distributed computation jobs on huge data sets – in other words, classic Big Data work. Hortonworks, meanwhile, is one of the bigger Hadoop vendors in the market, even if that’s more in terms of innovation than sales, where it trails Cloudera. Hortonworks founder and architect Arun Murthy is one of the original Hadoop coders who came out of Yahoo back in the day, and he also serves as the VP of the open source Apache Hadoop project at the Apache Software Foundation.

Which all means that any major platform move like this is sure to impact the rest of Hadoop development and, by extension, the rapidly growing Hadoop ecosystem that’s driving much of the big data sector.

Why Windows?

Until today’s announcement, Hadoop of any flavor typically ran on a Linux-based machine (physical or virtual). This made a lot of sense, since one of the big advantages of Hadoop is the capability to expand its data warehousing over any number of clustered computers. When those clustered machines are running Linux, it’s all but frictionless to add more, both in in terms of licensing cost (which is free) and configuration (which is easy).

But when the underlying operating system is Windows Server, licensing – i.e., explicitly not free – would seem likely to create a lot more friction when someone tries to build a Hadoop cluster. Wouldn’t using Windows Server as the OS for a Hadoop system be too expensive?

David McJannet, VP of marketing at Hortonworks, doesn’t seem to think so. McJannet’s concern was that too many Windows-based shops out there were shying away from Hadoop because they didn’t want to deal with adding Linux clusters and the related hassle of managing them. So assuaging those concerns was one big reason Microsoft has been working with Hortonworks over the past 18 months.

The sheer number of Windows installations was also a major issue. McJannet said that a “majority of servers” were running Windows in the enterprise now. In its press release, Hortonworks cited IDC data thusly: “According to IDC, Windows Server owned 73 percent of the market in 2012 (IDC, Worldwide and Regional Server 2012–2016 Forecast, Doc # 234339, May 2012).”

It is not clear just what server class this 73 percent represents, since the report itself costs $4,500, and is thus a little hard to access. File servers? Application servers? It’s sure not web servers, where according to Web analytics from Netcraft, Microsoft currently has 16.93% of the marketshare, dwarfed by Apache’s 55.26% marketshare.

McJannet also said Hadoop on Windows would make data exploration easier. Using SQL-based queries that can now directly integrate with the Hadoop Distributed File System (HDFS), products like SQL Server and Excel can tap straight into Hadoop-stored data, enabling end-users to more easily navigate vast stores of data in Hadoop clusters.

Embracing Open Source

This is not Hortonworks’ first foray into Windows land. Late last year, it released the Windows Azure HDInsight product – essentially Hadoop for the Azure cloud platform.

As odd as it may seem to see Hadoop on Windows Server, the move makes a lot of sense from Microsoft’s side. The company has needed a Big Data entry ever since it decided to drop its own Dryad data warehousing framework back in 2011. Some observers have expected this day ever since a year ago, when Microsoft announced it would build in tools within SQL Server to connect to Hadoop.

McJannet emphasized that to date, Microsoft was playing well with others within the open source development model that Hadoop uses, so much of its innovation will cycle back to the rest of the Hadoop community.

If so, you can expect to see more Hadoop vendors to announce their own connections to Windows in the near future.

Image courtesy of Shutterstock

About ReadWrite’s Editorial Process

The ReadWrite Editorial policy involves closely monitoring the tech industry for major developments, new product launches, AI breakthroughs, video game releases and other newsworthy events. Editors assign relevant stories to staff writers or freelance contributors with expertise in each particular topic area. Before publication, articles go through a rigorous round of editing for accuracy, clarity, and to ensure adherence to ReadWrite's style guidelines.

Get the biggest tech headlines of the day delivered to your inbox

    By signing up, you agree to our Terms and Privacy Policy. Unsubscribe anytime.

    Tech News

    Explore the latest in tech with our Tech News. We cut through the noise for concise, relevant updates, keeping you informed about the rapidly evolving tech landscape with curated content that separates signal from noise.

    In-Depth Tech Stories

    Explore tech impact in In-Depth Stories. Narrative data journalism offers comprehensive analyses, revealing stories behind data. Understand industry trends for a deeper perspective on tech's intricate relationships with society.

    Expert Reviews

    Empower decisions with Expert Reviews, merging industry expertise and insightful analysis. Delve into tech intricacies, get the best deals, and stay ahead with our trustworthy guide to navigating the ever-changing tech market.