There’s no beating around this bush. Today Hortonworks announced a new beta version of its Hadoop Data Platform that will run on Microsoft Windows Server, a move that shows Microsoft’s own Big Data efforts will forever be connected to open source innovation. This is a highly significant – even expected – move in the big data sector, but also a very strange one.
Hadoop, of course, is an open-source software architecture that supports distributed computation jobs on huge data sets – in other words, classic Big Data work. Hortonworks, meanwhile, is one of the bigger Hadoop vendors in the market, even if that’s more in terms of innovation than sales, where it trails Cloudera. Hortonworks founder and architect Arun Murthy is one of the original Hadoop coders who came out of Yahoo back in the day, and he also serves as the VP of the open source Apache Hadoop project at the Apache Software Foundation.
Which all means that any major platform move like this is sure to impact the rest of Hadoop development and, by extension, the rapidly growing Hadoop ecosystem that’s driving much of the big data sector.
Why Windows?
Until today’s announcement, Hadoop of any flavor typically ran on a Linux-based machine (physical or virtual). This made a lot of sense, since one of the big advantages of Hadoop is the capability to expand its data warehousing over any number of clustered computers. When those clustered machines are running Linux, it’s all but frictionless to add more, both in in terms of licensing cost (which is free) and configuration (which is easy).
But when the underlying operating system is Windows Server, licensing – i.e., explicitly not free – would seem likely to create a lot more friction when someone tries to build a Hadoop cluster. Wouldn’t using Windows Server as the OS for a Hadoop system be too expensive?
David McJannet, VP of marketing at Hortonworks, doesn’t seem to think so. McJannet’s concern was that too many Windows-based shops out there were shying away from Hadoop because they didn’t want to deal with adding Linux clusters and the related hassle of managing them. So assuaging those concerns was one big reason Microsoft has been working with Hortonworks over the past 18 months.
The sheer number of Windows installations was also a major issue. McJannet said that a “majority of servers” were running Windows in the enterprise now. In its press release, Hortonworks cited IDC data thusly: “According to IDC, Windows Server owned 73 percent of the market in 2012 (IDC, Worldwide and Regional Server 2012–2016 Forecast, Doc # 234339, May 2012).”
It is not clear just what server class this 73 percent represents, since the report itself costs $4,500, and is thus a little hard to access. File servers? Application servers? It’s sure not web servers, where according to Web analytics from Netcraft, Microsoft currently has 16.93% of the marketshare, dwarfed by Apache’s 55.26% marketshare.
McJannet also said Hadoop on Windows would make data exploration easier. Using SQL-based queries that can now directly integrate with the Hadoop Distributed File System (HDFS), products like SQL Server and Excel can tap straight into Hadoop-stored data, enabling end-users to more easily navigate vast stores of data in Hadoop clusters.
Embracing Open Source
This is not Hortonworks’ first foray into Windows land. Late last year, it released the Windows Azure HDInsight product – essentially Hadoop for the Azure cloud platform.
As odd as it may seem to see Hadoop on Windows Server, the move makes a lot of sense from Microsoft’s side. The company has needed a Big Data entry ever since it decided to drop its own Dryad data warehousing framework back in 2011. Some observers have expected this day ever since a year ago, when Microsoft announced it would build in tools within SQL Server to connect to Hadoop.
McJannet emphasized that to date, Microsoft was playing well with others within the open source development model that Hadoop uses, so much of its innovation will cycle back to the rest of the Hadoop community.
If so, you can expect to see more Hadoop vendors to announce their own connections to Windows in the near future.
Image courtesy of Shutterstock