Today Microsoft Research announced the availability of a free technology preview of Project Daytona MapReduce Runtime for Windows Azure. Using a set of tools for working with big data based on Google’s MapReduce paper, it provides an alternative to Apache Hadoop.
Daytona was created by the eXtreme Computing Group at Microsoft Research. It’s designed to help scientists take advantage of Azure for working with large, unstructured data sets. Daytona is also being used to power a data-analytics-as-a-service offering the team calls Excel DataScope.
Big Data Made Easy?
The team’s goal was to make Daytona easy to use. Roger Barga, an architect in the eXtreme Computing Group, was quoted saying:
“‘Daytona’ has a very simple, easy-to-use programming interface for developers to write machine-learning and data-analytics algorithms. They don’t have to know too much about distributed computing or how they’re going to spread the computation out, and they don’t need to know the specifics of Windows Azure.”
To accomplish this difficult goal (MapReduce is not known to be easy) Microsoft Research is including a set of example algorithms and other sample code along with a step-by-step guide for creating new algorithms.
Data Analytics as a Service
To further simplify the process of working with big data, the Daytona team has built an Azure-based analytics service called Excel DataScope, which enables developers to work with big data models using an Excel-like interface. According to the project site, DataScope allows the following:
- Users can upload Excel spreadsheets to the cloud, along with metadata to facilitate discovery, or search for and download spreadsheets of interest.
- Users can sample from extremely large data sets in the cloud and extract a subset of the data into Excel for inspection and manipulation.
- An extensible library of data analytics and machine learning algorithms implemented on Windows Azure allows Excel users to extract insight from their data.
- Users can select an analysis technique or model from our Excel DataScope research ribbon and request remote processing. Our runtime service in Windows Azure will scale out the processing, by using possibly hundreds of CPU cores to perform the analysis.
- Users can select a local application for remote execution in the cloud against cloud scale data with a few mouse clicks, effectively allowing them to move the compute to the data.
- We can create visualizations of the analysis output and we provide the users with an application to analyze the results, pivoting on select attributes.
This reminds me a bit of Google’s integration between BigQuery and Google Spreadsheets, but Excel DataScope sounds much more powerful.
We’ve discussed data as a service as a future market for Microsoft previously.
Microsoft’s Other Hadoop Alternative
Microsoft also recently released the second beta of its other Hadoop alternative LINQ to HPC, formerly known as Dryad. LINQ/Dryad have been used for Bing for some time, but not the tools are available to users of Microsoft Windows HPC Server 2008 clusters.
Instead of using MapReduce algorithms, LINQ to HPC enables developers to use Visual Studio to create analytics applications for big, unstructured data sets on HPC Server. It also integrates with several other Microsoft products such as SQL Server 2008, SQL Azure, SQL Server Reporting Services, SQL Server Analysis Services, PowerPivot, and Excel.
Microsoft also offers Windows Azure Table Storage, which is similar to Google’s BigTable or Hadoop’s data store Apache HBase.
More Big Data Initiatives from Microsoft
We’ve looked previously at Probase and Trinity, two related big data projects at Microsoft Research. Trinity is a graph database, and Probase is a machine learning platform/knowledge base.
We also covered Project Barcelona, an enterprise search system that will compete with Apache Solr.