Just last week, ReadWriteWeb’s Joe Brockmeier asked and answered the question, “Who wrote Hadoop?” That’s the cloud database framework that scales huge datasets over multiple clusters, distributed under the Apache v2 open source license.
As of this morning, Cloudera and the other members of the Hadoop community have a new neighbor. Hortonworks is now coalescing with Microsoft to distribute a new Hadoop distribution for Windows Server and a Hadoop service for Windows Azure. This news from the PASS Summit for SQL Server in Seattle.
“Hadoop has grown to be a compelling platform for managing and processing types of data that were born in, and live outside, the traditional database management system environment,” Doug Leland, general manager for product management for SQL Server, tells RWW. “But they want to be able to bring that into their data platform environment.”
Microsoft’s plan is to make Hadoop data accessible through Windows Azure cloud-based deployments, as well as enable it to be analyzed with the company’s Business Intelligence tools.
“It is important for us to integrate this distribution in with the Windows infrastructure, to provide the security capabilities, the management capabilities, and the performance characteristics that our customers expect,” Leland continues. “We will integrate the Hadoop distribution with Active Directory for security and data access control, with our management stack to provide greater manageability – to really make it a first-class citizen of our data platform.”
SQL Server 2012 marches in alongside the elephant
Incidentally, just as a little aside, today is the day Microsoft formally announces SQL Server 2012, for availability within the first half of that year. With the extent of Community Technology Preview rollouts (formally known as “public betas”) between now and general availability (GA), it’s probably safer to focus on spring than winter. “Denali,” as SS 2012 was called, has already undergone three CTP stages. Still, the first CTP of the Hadoop distro and Azure service will share the roadmap with SS 2012.
One of the new RDBMS’ most intriguing new features is a rapid data discovery tool, formerly called “Crescent” but which today has been dubbed PowerView. Taking a cue from the Metro style for Web apps on the upcoming Windows 8, PowerView utilizes its own class of apps, if you will, that Microsoft is calling insights. They’re minimally adorned charts that offer flexible representations of databases that are sensitive to touch, and more adaptable to tablet-based use cases.
“Crescent” has been actively tested since late last year. But today, we’re learning for the first time that SQL Server will extend insights to Hadoop data sets. So the same visibility tools that Microsoft is introducing today for SQL Server data will also apply to Hadoop.
“Whether you call it unstructured, ‘big data,’ – these terms are somewhat descriptive, but also misnomers. We are providing the capability to manage and process it where it lives,” states Microsoft’s Leland. “Once you’ve discovered interesting insights from them, you can either a) bring it into a SQL Server environment using connectors; or b) drive analysis across it using our BI tools. We are providing the flexibility to customers to choose the tools they use for the job.”
Hadoop’s data warehousing system is called Hive, and it utilizes a kind of SQL-like language called HiveQL for queries and analysis. Microsoft will be producing an ODBC driver that extends its own existing query systems to Hive. This way, Doug Leland tells us, users will be able to execute direct Hadoop queries from Excel, PowerView, and the already announced forthcoming BI plug-in for Excel called PowerPivot.
The necessary Hadoop connectors for the existing SQL Server 2008 R2 and SQL Server Parallel Data Warehouse, are now complete and are being released today. “We’re opening up a scenario for bidirectional data movement between Hadoop and SQL Server, as well as for BI, so you can run analysis straight across data living in Hadoop.”
Microsoft, the new elephant in the room
This is all being made possible through a strategic partnership between Microsoft and Hortonworks, which is a provider of commercial Hadoop distributions spun off from Yahoo last summer, and the company which last week claimed an elephant’s share of the credit for contributing to the Hadoop code base. “They are going to be lending us their guidance and technical expertise, and working with us to help accelerate the delivery of our distribution for Windows Server and the service for Windows Azure,” states Leland.
He adds that the company promises to fully interact with the Hadoop development community, which will mean contributing any innovations made to Hadoop back to the Apache Software Foundation. “The changes that we make, including developing the distribution for Windows, will be proposed back to Apache for inclusion in the core trunk. Our commitment is for compatibility and interoperability, and to do so through open contributions.”
CTP of the Hadoop service for Windows Azure will begin before the end of the year, says Leland, while CTP for Hadoop for Windows Server will begin next year. Whether Hadoop management will be implemented as a role for Server Manager, for instance, has yet to be determined, and may be decided during the CTP process.