<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
        <channel>
        <title>Hadoop - ReadWrite</title>
        <link>http://readwrite.com</link>
        <description />
        <language>en</language>
        <copyright>Copyright 2012 SAY Media, Inc.</copyright>
        <managingEditor>readwriteweb@gmail.com</managingEditor>
        <docs>http://blogs.law.harvard.edu/tech/rss</docs> 
        <lastBuildDate>Fri, 24 May 2013 05:05:00 -0700</lastBuildDate>
        <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://rww.superfeedr.com/" />

                    <item>
                <title><![CDATA[Hadoop 2.0 & YARN: Get Ready For This Summer's Big Data Breakthrough ]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/Breakthrough_BigData.jpg" />
                                        Lately, it seems like Hadoop, the open source data platform seemingly so integral to the rise of Big Data, can't catch a break. Instead of relying on Hadoop as a key Big Data storage and analysis tool, vendors and observers are increasingly positioning it as<a title="http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think" href="http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think">&nbsp;"just" a storage tool</a>.
<p>But this isn't necessarily a <em>bad</em> thing. Using Hadoop for cheap and efficient storage is a perfect starting point for the next phase of Hadoop's evolution. With this summer's expected debut of Hadoop 2.0, changes are afoot that will make information found within data warehouses and unstructured "data lakes" more&nbsp;accessible&nbsp;than ever.</p>
<h2>Hadoop As A Big Bucket</h2>
<p>Hadoop has been a great system for storing data since its initial adoption as a Big Data tool, but the data-processing framework MapReduce, which requires the creation of Java apps to reach into stored data and pull out the information required, has a high learning curve.</p>
<p><strong style="line-height: 1.538em;">(See also&nbsp;<a href="http://readwrite.com/2013/05/23/hadoop-what-it-is-and-how-it-works" target="_blank">Hadoop: What It Is And How It Works</a>.)</strong></p>
<p>There are other ways to get information out of Hadoop, of course. The&nbsp;<a href="http://hbase.apache.org/" target="_blank">HBase</a>&nbsp;database is included in Hadoop, letting users treat data with a database paradigm. And the <a href="http://hive.apache.org/" target="_blank">Hive</a> data warehouse system enables you to build queries in the SQL-like HiveQL query language that can be converted to MapReduce jobs. But Hadoop is still limited by the fact that everything you do in it still has to be done one thing at a time. MapReduce jobs, Hive queries, HBase operations… they all have to take turns.</p>
<p>This is why a lot of vendors tend to frame Hadoop as the bucket in which data is stored, and cast their products as the magical tool to pull out or analyze that data. In fact, while the data bucket metaphor is apt, it has been super-sized among Hadoop users to become known as data lakes or even data oceans. Given the perceived limitations of Hadoop in its present state, it's not a hard sell to make.</p>
<p>But as the Hadoop development community starts ramping up for the next iteration of Hadoop, those limitations are about to be greatly reduced.</p>
<h2>Knitting A YARN Solution</h2>
<p>For&nbsp;Arun Murthy, the release manager for Hadoop 2.0,&nbsp;the most important change will be upgrading the MapReduce framework to <a style="line-height: 1.538em;" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html" target="_blank">Apache YARN</a>, which will expand what software can be used in Hadoop and how much. Murthy, who is also YARN project lead and&nbsp;co-founder of&nbsp;<a style="line-height: 1.538em;" href="http://hortonworks.com">Hortonworks</a>,&nbsp;explained that&nbsp;"In Hadoop 1.0, everything was batch-oriented. In 2.0, you will now have multiple apps hitting the data inside all at once."</p>
<p>What YARN does, essentially, is divide the functionality of MapReduce even further, breaking the two major responsibilities of the MapReduce JobTracker component - resource management and job scheduling/monitoring - into separate daemons: a global ResourceManager and per-application ApplicationMaster.</p>
<p>Splitting up these functions provides a more powerful way to manage a Hadoop cluster's resources than the current MapReduce systems can handle. It manages resources similar to the way an operating system handles jobs, which means no more one-at-a-time limitations.</p>
<p>With YARN, developers will be able to build apps directly within Hadoop, instead of bolting them on from the outside, as many third-party vendor tools have to do now.</p>
<p>Murthy reported that the Apache Hadoop community is already seeing keen interest from vendors who want to build their apps within the YARN framework that will live directly inside Hadoop and have resources managed by YARN.</p>
<p>Because the Apache Hadoop community is driving the development of the new version of Hadoop, there is no set timeline for Hadoop's progress this summer. Murthy predicted that a "strong beta" of Hadoop 2.0 might be available in June or July timeframe, with a final release perhaps ready by August.</p>
<p>If YARN lives up to its promises, a lot of data lakes and oceans will suddenly be more accessible within the native Hadoop platform, which will greatly streamline and speed up the task of finding useful information. Big Data is about to get even more useful.</p>
<p><em>&nbsp;</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/05/24/hadoop-20-yarn-bid-data-mapreduce</link>
                <guid>http://readwrite.com/2013/05/24/hadoop-20-yarn-bid-data-mapreduce</guid>
                <category>Hadoop</category>
                <pubDate>Fri, 24 May 2013 05:05:00 -0700</pubDate>
                <author>Brian Proffitt</author>
            </item>
                    <item>
                <title><![CDATA[Hadoop: What It Is And How It Works]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/BigDataArticle.jpg" />
                                        <p>You can't have a conversation about Big Data for very long without running into the elephant in the room: Hadoop. This open source software platform managed by the <a href="http://www.apache.org/" target="_blank">Apache Software Foundation</a> has proven to be very helpful in storing and managing vast amounts of data cheaply and efficiently.</p>
<p>But what exactly <em>is</em>&nbsp;<a style="text-decoration: underline;" title="http://hadoop.apache.org" href="http://hadoop.apache.org">Hadoop</a>, and what makes it so special? Basically, it's a way of storing enormous data sets across distributed clusters of servers and then running "distributed" analysis applications in each cluster.</p>
<p>It's designed to be robust, in that your Big Data applications will continue to run even when individual servers — or clusters — fail. And it's also designed to be efficient, because it doesn't require your applications to shuttle huge volumes of data across your network.</p>
<p>Here's how Apache formally describes it:</p>
<blockquote>
<p>The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly available service on top of a cluster of computers, each of which may be prone to failures.</p>
</blockquote>
<p>Look deeper, though, and there's even more magic at work. Hadoop is almost completely modular, which means that you can swap out almost any of its components for a different software tool. That makes the architecture incredibly flexible, as well as robust and efficient.</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/_hadoopelephant_rgb1.png" style="" />
			</span>
</p>
<h2>Hadoop Distributed Filesystem (HDFS)</h2>
<p>If you remember nothing else about Hadoop, keep this in mind: It has two main parts - a data processing framework and a distributed filesystem for data storage. There's more to it than that, of course, but those two components really make things go.</p>
<p>The distributed filesystem is that far-flung array of storage clusters noted above - i.e., the Hadoop component that holds the actual data. By default, Hadoop uses the cleverly named <a href="http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf" target="_blank">Hadoop Distributed File System (HDFS)</a>, although it can use other file systems as well.</p>
<p>HDFS is like the bucket of the Hadoop system: You dump in your data and it sits there all nice and cozy until you want to do something with it, whether that's running an analysis on it within Hadoop or capturing and exporting a set of data to another tool and performing the analysis there.</p>
<h2>Data Processing Framework &amp; MapReduce</h2>
<p>The data processing framework is the tool used to work with the data itself. By default, this is the Java-based system known as <a href="http://www-01.ibm.com/software/data/infosphere/hadoop/mapreduce/" target="_blank">MapReduce</a>. You &nbsp;hear more about MapReduce than the HDFS side of Hadoop for two reasons:</p>
<ol>
<li><span style="line-height: 1.538em;">It's the tool that actually gets data processed.</span></li>
<li><span style="line-height: 1.538em;" data-mce-mark="1">It tends to drive people slightly crazy when they work with it.</span></li>
</ol>
<p>In a "normal" relational database, data is found and analyzed using queries, based on the industry-standard <a href="http://www.techterms.com/definition/sql" target="_blank">Structured Query Language (SQL)</a>. Non-relational databases use queries, too; they're just not constrained to use only SQL, but can use other query languages to pull information out of data stores. Hence, the term <a href="http://www.zdnet.com/what-is-nosql-and-why-do-you-need-it-7000004989/" target="_blank">NoSQL</a>.</p>
<p>But Hadoop is not really a database: It stores data and you can pull data out of it, but there are no queries involved - SQL or otherwise. Hadoop is more of a data warehousing system - so it needs a system like MapReduce to actually process the data.</p>
<p>MapReduce runs as a series of jobs, with each job essentially a separate Java application that goes out into the data and starts pulling out information as needed. Using MapReduce instead of a query&nbsp;gives data seekers a lot of power and flexibility, but also adds a lot of complexity.</p>
<p>There are tools to make this easier: Hadoop includes <a href="http://hive.apache.org/" target="_blank">Hive</a>, another Apache application that helps convert query language into MapReduce jobs, for instance. But MapReduce's complexity and its limitation to one-job-at-a-time batch processing tends to result in Hadoop getting used more often as a data warehousing than as a data analysis tool.</p>
<p><strong>(See also&nbsp;<a title="http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think" href="http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think">Hadoop Adoption Accelerates, But Not For Data Analytics</a>.)</strong></p>
<h2>Scattered Across The Cluster</h2>
<p>There is another element of Hadoop that makes it unique: All of the functions described act as distributed systems, not the more typical centralized systems seen in traditional databases.</p>
<p>In a database that uses multiple machines, the work tends to be divided out: all of the data sits on one or more machines, and all of the data processing software is housed on another server (or set of servers).</p>
<p>On a Hadoop cluster, the data within HDFS and the MapReduce system are housed on every machine in the cluster. This has two benefits: it adds redundancy to the system in case one machine in the cluster goes down, and it brings the data processing software into the same machines where data is stored, which speeds information retrieval.&nbsp;</p>
<p>Like we said: Robust and efficient.</p>
<p>When a request for information comes in, MapReduce uses two components, a JobTracker that sits on the Hadoop master node, and TaskTrackers that sit out on each node within the Hadoop network.</p>
<p>The process is fairly linear: The <em>Map</em> part is accomplished by the JobTracker dividing computing jobs up into defined pieces and shifting those jobs out to the TaskTrackers on the machines out on the cluster where the needed data is stored. Once the job is run, the correct subset of data is <em>Reduced</em> back to the central node of the Hadoop cluster, combined with all the other datasets found on all of the cluster's machines.</p>
<p>HDFS is distributed in a similar fashion. A single NameNode tracks where data is housed in the cluster of servers, known as DataNodes. Data is stored in data blocks on the DataNodes. HDFS replicates those data blocks, usually 128MB in size, and distributes them so they are replicated within multiple nodes across the cluster.</p>
<p>This distribution style gives Hadoop another big advantage: Since data <em>and</em> processing live on the same servers in the cluster, every time you add a new machine to the cluster, your system gains the space of the hard drive and the power of the new processor.</p>
<h2>Kit Your Hadoop</h2>
<p>As mentioned earlier, users of Hadoop don't have to stick with just HDFS or MapReduce. For its <a title="http://aws.amazon.com/ec2/" href="http://aws.amazon.com/ec2/">Elastic Compute Cloud</a> solutions, Amazon Web Services has adapted its own S3 filesystem for Hadoop. <a href="http://www.datastax.com/dev/blog/brisk-1-0-beta-2-released" target="_blank">DataStax' Brisk</a> is a Hadoop distribution that replaces HDFS with Apache Cassandra's <a href="http://www.datastax.com/dev/blog/cassandra-file-system-design" target="_blank">CassandraFS</a>.</p>
<p><span style="line-height: 1.538em;" data-mce-mark="1">To get around MapReduce's first-in-first-out limitations,</span>&nbsp;the Cascading framework gives developers an easier tool in which to run jobs and more flexibility to schedule jobs.</p>
<p>Hadoop is not always a complete, out-of-the-box solution for every Big Data task. MapReduce, as noted, is enough of a pressure point that many Hadoop users prefer to use the framework only for its capability to store lots of data fast and cheap.</p>
<p>But Hadoop is still the best, most widely used system for managing large amounts of data quickly when you don't have the time or the money to store it in a relational database. That's why Hadoop is likely to remain the elephant in the Big Data room for some time to come.</p>
<p><strong>(See also <a href="http://readwrite.com/search?keyword=hadoop" target="_blank">ReadWrite's Hadoop coverage</a>.)</strong></p>
                    ]]></description>
                <link>http://readwrite.com/2013/05/23/hadoop-what-it-is-and-how-it-works</link>
                <guid>http://readwrite.com/2013/05/23/hadoop-what-it-is-and-how-it-works</guid>
                <category>Hadoop</category>
                <pubDate>Thu, 23 May 2013 05:05:00 -0700</pubDate>
                <author>Brian Proffitt</author>
            </item>
                    <item>
                <title><![CDATA[Hadoop Adoption Accelerates, But Not For Data Analytics]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_136899149.jpg" />
                                        <p>The Hadoop market is on a tear, growing at a compound annual growth rate of roughly 60%, <a href="http://www.idc.com/getdoc.jsp?containerId=prUS23471212">according to IDC</a>. But why it's growing, or rather, how it's being used, might surprise you. Given all the media hype around Hadoop and its power to predict everything from the optimal number of raisins in your cereal (23) to the exact date of Armageddon (next Tuesday - call in sick), it's perhaps surprising to learn that comparatively few organizations use Hadoop for analytics. Today most enterprises use Hadoop for the pedestrian uses of storage and ETL (Extract, Transform, Load).</p>
<p>Eventually enterprises get to sexy analytics. But we're not there yet. Not by a long shot.</p>
<h3>'Poor Man's ETL', 'Unsupervised Digital Landfill', Or Both?</h3>
<p>While commonly billed as an analytics tool, Hadoop remains "a poor man's ETL" for the vast majority of enterprises. Yes, there are enterprises running interesting analytical workloads on Hadoop, but these are the exception, not the rule. Hence, while <a href="http://blog.cloudera.com/blog/2013/02/big-datas-new-use-cases-transformation-active-archive-and-exploration/">Cloudera cites</a> three common use cases for Hadoop (data transformation, archiving, and exploration, I'm hearing from analysts that 75% or more of the actual Hadoop adoption resides in those first two use cases.</p>
<p>Which is not to suggest such adoption is valueless. Quite the contrary.</p>
<h3>The Common Adoption Path For Hadoop</h3>
<p>As 451 Research analyst <a href="http://www.slideshare.net/Hadoop_Summit/what-is-the-point-of-hadoop">Matt Aslett highlighted at Hadoop Summit</a>, there is a natural progression from using Hadoop to store large quantities of data (i.e., Hadoop as an "<a href="http://readwrite.com/2013/02/11/big-data-and-the-landfills-of-our-digital-lives">unsupervised landfill</a>"), to processing and transforming that data and ultimately to analyzing that data. The fact that most enterprises have yet to get to analytics in any meaningful way is simply a description of where we are in the Hadoop market's evolution.</p>
<div style="text-align: center;"><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/17825514" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="427" height="356"> </iframe></div>
<div style="margin-bottom: 5px; text-align: center;"><strong> <a title="What is the Point of Hadoop" href="http://www.slideshare.net/Hadoop_Summit/what-is-the-point-of-hadoop" target="_blank">What is the Point of Hadoop</a> </strong> from <strong><a href="http://www.slideshare.net/Hadoop_Summit" target="_blank">Hadoop_Summit</a></strong></div>
<p>Indeed, Aslett notes that "attempting to fast forward to analytics, missing out on the processing/integration stage, creates silos and will result in disillusionment."&nbsp;</p>
<p>We're still early in Hadoop's technological and market evolution, in part due to the complexity of the technology, with <a href="http://www.cioinsight.com/it-news-trends/slideshows/hadoop-adoption-proves-slow-but-steady-05/">26% of even the most sophisticated Hadoop users</a> citing how long it takes to get into production as a gating factor to its widespread use. Gartner reveals even lower rates of adoption of Big Data projects, often involving Hadoop, at a mere 6%, as enterprises try to grapple with both appropriate use cases and understanding the relevant technology.</p>
<h3>Start With What You Know</h3>
<p>Small wonder, then, that enterprises are starting with known use cases like storage or ETL before proceeding to more ambitious analytics projects, as <a href="https://twitter.com/ckotsakis/status/332529969580351489">Christos Kotsakis suggests</a>. We're still getting comfortable with Hadoop. Applying an unfamiliar technology to a familiar problem makes a lot of sense.</p>
<p>Some day, we'll get to the point where mainstream adopters commonly use Hadoop for significant analytics. But we're not there. Not yet. Just give it time.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em>.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think</link>
                <guid>http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think</guid>
                <category>Hadoop</category>
                <pubDate>Fri, 10 May 2013 04:30:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[The Rising Costs Of Misunderstanding Big Data]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_104929805.jpg" />
                                        <p>The Big Data boom has largely been fueled by a simple calculation: Data + Technology = Actionable Insights, Magic Ponies, and Superpowers. The reality, of course, is far more pedestrian, because while Big Data technology has indeed increased our ability to store and process lots of disparate data in real-time, the technology is only as useful the people managing it. As Bill Wise, CEO of Mediaocean, <a href="http://allthingsd.com/20130423/big-datas-usability-problem/">highlights</a>, the costs of getting it wrong increase as our reliance on data grows.</p>
<p>To be clear, we've long been able to query so-called "Big Data." We've had expensive data warehousing and Business Intelligence tools for many years. The great innovation of tools like Hadoop is that they've made such capabilities available as free, open-source tools that run on commodity hardware, essentially paving the way for anyone and everyone to become a data scientist.</p>
<p>Therein lies the problem.</p>
<p>Taking an influential paper on economics and intelligence efforts around the Boston bombing suspects as background, wherein a few missing rows in Excel and a misspelling of Boston Marathon bombing suspect Tamerlan Tsarnaev's name, Wise points out that "data management tools (i.e., the FBI’s systems and Excel) were undone by fairly simple errors," with terrible results. In other words, as much as we may believe Big Data is as simple as "Input data into Hadoop, out come insights!", the reality depends heavily on the people querying that data.</p>
<p>And the bigger the data, the bigger the likelihood we'll read it wrong, as Wise posits:</p>
<blockquote>
<p>[M]ore human/data interaction means a lot more room for error (and inefficiency) around increasingly critical data sets - which... can have very serious results... If Big Data can’t fit hand-in-glove with usability and workflow, a lot of the promise of big data will be empty data crunching. That’s not just a problem for getting where we want to be in the evolution of computing. It’s a situation that can lead to bad data management - which translates into bad economics and, sometimes, far worse.</p>
</blockquote>
<p>This confirms renowned statistician <a href="http://readwrite.com/2013/03/29/nate-silver-gets-real-about-big-data">Nate Silver's arguments</a> that data doesn't speak for itself, but is instead corrupted by our biases. Worse, the bigger the data set, the more noise to sift through: "the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine - but a relatively constant amount of objective truth."</p>
<p>Often, misunderstanding our data simply means our businesses will run more inefficiently or, at least, no more efficiently than before. But if Wise is correct, getting our data wrong can have disastrous consequences.</p>
<p>Which&nbsp;means, as <a href="http://data-informed.com/the-mythical-data-scientist-shortage/">I've argued before</a>, that we really need to look inside our organizations for "data scientists," because context is critical to effectively querying our data, as well as knowing which data to collect in the first place. It also means, as <a href="http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html">Kate Crawford argues</a> in <em>Harvard Business Review,</em>&nbsp;"data scientists should take a page from social scientists, who have a long history of asking where the data they're working with comes from, what methods were used to gather and analyze it, and what cognitive biases they might bring to its interpretation."&nbsp;</p>
<p><span style="line-height: 1.538em;">In other words, the more data has the potential to impact our organizations, the more humble and circumspect we should become in using it. The consequences of reading our data wrong scale with the volume and velocity of that data.</span></p>
<p><em><span style="line-height: 1.538em;">Image courtesy of</span><span style="line-height: 1.538em;"><a href="http://www.shutterstock.com"> Shutterstock</a>.</span></em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/04/29/the-rising-costs-of-misunderstanding-big-data</link>
                <guid>http://readwrite.com/2013/04/29/the-rising-costs-of-misunderstanding-big-data</guid>
                <category>Big data</category>
                <pubDate>Mon, 29 Apr 2013 04:00:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[Making Do With Google's Leftovers]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_82331755.jpg" />
                                        <p class="p1">It's perhaps one of the industry's great ironies that today's hottest enterprise technology is yesterday's leftovers at Google. Hadoop, an open-source implementation of Google's MapReduce technology, is all the rage in the enterprise as a primary tool for tackling Big Data, and probably will remain such for years to come.</p>
<p class="p1">But at Google, MapReduce may already be too slow and not nearly scalable enough.</p>
<p class="p1">This isn't news. Mike Miller, CEO of Cloudant, <a href="http://gigaom.com/2012/07/07/why-the-days-are-numbered-for-hadoop-as-we-know-it">made this point</a> in 2012, and Bill McColl, CEO of Cloudscale, <a href="http://www.nytimes.com/external/gigaom/2010/10/23/23gigaom-beyond-hadoop-next-generation-big-data-architectu-81730.html">made it</a> two years before that. As McColl argued in 2010, "the people who really do have cutting edge performance and scalability requirements today have already moved on from the Hadoop model."</p>
<p class="p1">Which is another way of saying Google lives in the future.</p>
<p class="p1">I've <a href="http://readwrite.com/2013/01/04/html5-not-linux-key-to-ubuntus-quixotic-mobile-war">told the story before</a> about a wealthy friend telling me his money lets him "see into the future a few years" by affording expensive things today that will be cheap for everyone in the future. In a similar fashion, Google, <a href="http://readwrite.com/2013/01/07/trickle-down-web-innovation-breathing-new-life-into-enterprise-it">not to mention other web giants like Facebook and Twitter</a>, is building things today, to solve problems of scale and data processing, that will likely be commonplace for mainstream enterprises tomorrow.&nbsp;</p>
<p class="p1">Today Google's data and scale problems are almost magical. Tomorrow they will likely be average.</p>
<p class="p1">Which may mean that peering into the future, whether you're an entrepreneur or a venture capitalist, may be as simple as watching Google. While <a href="https://developers.facebook.com/opensource/">Facebook releases</a> much of its code as open source, the place to gaze into Google's soul is its treasure trove of <a href="http://research.google.com/pubs/papers.html">published&nbsp;research</a>. There you'll find "Efficient spatial sampling of large geographical tables" and more information on "Spanner: Google's Globally-Distributed Database."</p>
<p class="p1">You will see, in other words, the future of enterprise computing, otherwise known as Google's leftovers.&nbsp;</p>
<p class="p1"><em>Image courtesy of <a href="http://www.shutterstock.com/gallery-987p1.html?cr=00&amp;pl=edit-00">AHMAD FAIZAL YAHYA</a> / <a href="http://www.shutterstock.com/?cr=00&amp;pl=edit-00">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/04/10/making-do-with-googles-leftovers</link>
                <guid>http://readwrite.com/2013/04/10/making-do-with-googles-leftovers</guid>
                <category>Google</category>
                <pubDate>Wed, 10 Apr 2013 07:07:44 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[Microsoft Azure: Open Source Is A First-Class Citizen]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/files/files/images/azure.jpg" />
                                        <p>Apple may be that kid who never learned to share, but Microsoft over the years hasn't been much better. While the company has long had a healthy partner ecosystem, if you really wanted tight integration with one of Microsoft's products, you pretty much had to work in Redmond. Microsoft Office worked seamlessly with Microsoft Windows worked seamlessly with Microsoft SQL Server worked seamlessly with Microsoft Sharepoint worked seamlessly with... you get the picture.</p>
<p>Of late, however, Microsoft's underdog status in key markets has made it more amenable to a truly open partner ecosystem, perhaps best exemplified by its open arms to open source.</p>
<p>Nowhere is this more evident than with Windows Azure.</p>
<p>While the old world of Azure looked much like the image above, with Microsoft technology as far as the eye could see, the new Azure looks much different. For one thing, much of the best technology being served up on Azure wasn't written by Microsoft. Really! I'm not joking.</p>
<p>For example, in partnership with Hortonworks, Microsoft has released its <a href="http://blogs.msdn.com/b/windowsazure/archive/2013/03/18/announcing-the-public-preview-of-azure-hdinsight.aspx">first public preview of Windows Azure HDInsight Service</a>, Microsoft's cloud-based distribution of Hadoop, the popular open-source Big Data processing tool. Another example of <a href="http://en.wikipedia.org/wiki/Embrace,_extend_and_extinguish">Microsoft's classic embrace and extend strategy</a>? Nope. This time around, Microsoft promises that HDInsight will be "100% &nbsp;Apache Hadoop compatible now and in the future."</p>
<p>But Hadoop isn't the only open-source technology included by the Azure team.</p>
<h3>More than Just Hadoop</h3>
<p>In the olden days, Microsoft would have put all its engineering into supporting its own technologies on a first-class basis. Others might try to catch the Microsoft train, but they'd reverse engineer their way onto the back of the caboose, with just a slight API tweak away from incompatibility. Now it's Microsoft Azure that is adding <a href="http://www.windowsazure.com/en-us/develop/mobile/tutorials/get-started-android/">support for Android</a>, not to mention PhoneGap. All of which follows the Azure team's long-time support for <a href="http://drupal.org/project/azure">Drupal</a>, various <a href="http://msdn.microsoft.com/en-us/magazine/jj851073.aspx">open-source databases</a>, <a href="http://www.windowsazure.com/en-us/manage/linux/">Linux virtual machines</a>, and a range of other open-source software.</p>
<p>"<em>Of course</em> Microsoft supports open-source software on Azure because it's a platform," you argue, "and so Microsoft&nbsp;<em>must</em> support third-party technology as a platform provider."</p>
<p>But that "of course" was lost on Microsoft for years. Through a personal agreement between Bill Gates and Steve Jobs, Microsoft Office came to Mac OS X, but it still hasn't touched Linux. Same with SQL Server. You can get the popular database to run on Linux, but not as a first-class citizen. That's reserved for Windows.&nbsp;</p>
<h3>Windows Azure's Open Community</h3>
<p>Beyond directly supporting open-source software on Azure, <a href="http://readwrite.com/2013/03/01/microsoft-strikes-back-at-amazon-with-windows-azure-community-portal">Microsoft has also opened up its Windows Azure Community Portal</a> to make it easy for partners to add third-party services to Azure, both open and closed. This is a big deal for SMBs and departments within enterprises that have traditionally been Microsoft's mainstay, as BitNami founder and CTO Daniel Lopez told me:</p>
<blockquote>
<p>"For customers who are looking to the cloud to run department or workgroup level apps... and who are already customers of Microsoft, the transition to Azure may be simpler and more cost-effective than moving to Amazon.&nbsp;</p>
<p>"Microsoft has traditionally dominated the SMB market. As SMBs move to the cloud, SaaS cannot meet their customization needs. They need to run their own apps - they just don't want the hassle of running their own servers. Nobody has figured out the 'Application layer' in the cloud yet, but Microsoft is actually in a better starting position than its competitors (Amazon, Google) because it already has a huge installed based and an ecosystem of partners."&nbsp;</p>
</blockquote>
<p>Microsoft, in other words, finally groks "open." In part Microsoft shows this by embracing leading open-source technology like Hadoop or Android, but it's just as clear by its willingness to let partners embrace and extend Azure with other offerings. Yes, Microsoft has long done this with Windows, but it was never a level playing field for some kinds of technology, like open source.</p>
<p>Which is not to say Microsoft has won the public cloud. Today that <a href="http://readwrite.com/2013/03/19/amazon-king-of-cloud-computing-forever">distinction clearly goes to Amazon Web Services</a>. But while AWS is sexy with the Silicon Valley set, the horde of SMBs and enterprises that have traditionally gone with Microsoft will be looking closely at Azure. Microsoft remains the CIO's top vendor, <a href="http://rcpmag.com/articles/2013/02/15/microsoft-top-vendor-to-cios.aspx">according to a Piper Jaffray survey</a>. By embracing open source, it stands a chance of being the enterprise developer's top vendor, too.&nbsp;</p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/20/microsoft-azure-open-source-is-a-first-class-citizen</link>
                <guid>http://readwrite.com/2013/03/20/microsoft-azure-open-source-is-a-first-class-citizen</guid>
                <category>Microsoft</category>
                <pubDate>Wed, 20 Mar 2013 06:45:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[One Hadoop Distribution To Rule Them All?]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_85761811.jpg" />
                                        <p>The Hadoop market is getting interesting. Last year it was a <a href="http://www.theregister.co.uk/2012/08/17/community_hadoop/">death match</a> between startups vying to own the heart of the project. Today it's a veritable smorgasbord of big-brand vendors getting involved to ensure they claim a big piece of the Big Data pie. Unlike American youth athletics, not everyone will get to take home a trophy.</p>
<p>Hadoop plays a key role in the burgeoning Big Data market, and represents a $13 billion market by 2017, <a href="http://www.prweb.com/releases/big-data-analytics/hadoop-market/prweb10196532.htm">according to Markets and Markets</a>. (IDC pegs the market <a href="http://www.businesswire.com/news/home/20120507005611/en/IDC-Releases-Worldwide-Hadoop-MapReduce-Ecosystem-Software-Forecast">much, much lower</a>&nbsp;at&nbsp;$812.8 million in 2016, but its numbers don't seem credible to me as they don't even seem to include Cloudera's sales.) Given that Big Data is hot, and Hadoop's data processing engine sits at its core, there's going to be a lot of money trading hands for Hadoop-related products and services.</p>
<p>Not everyone is going to collect.</p>
<p>SiliconAngle's <a href="http://siliconangle.com/blog/2012/08/17/big-data-death-match-hadoop-hortonworks-cloudera/">John Furrier has challenged me on this</a>, arguing that Hadoop is "not a winner take all market." While I, too, can see multiple winners in Hadoop, just as there have been in Linux (e.g., Red Hat dominates license/services revenue, but IBM, HP, and others make arguably more with related hardware, complementary software products, and professional services), markets don't tend toward entropy. They trend toward consolidation.</p>
<p>Today, the <a href="http://www.datameer.com/blog/perspectives/hadoop-ecosystem-as-of-january-2013-now-an-app.html">Hadoop ecosystem</a> increasingly represents entropy:</p>
<ul>
<li><strong>Cloudera</strong>, <strong>Hortonworks</strong>, and <strong>MapR</strong> remain the early favorites, but with very different approaches. Hortonworks positions itself as the 100% open source player; Cloudera somewhat does the same, but adds in complementary, proprietary bits, mostly around managing Hadoop, to add value to Hadoop (and its top line revenue); and MapR provides a hybrid open source/proprietary Hadoop distribution that swaps out HDFS for its proprietary NFS storage layer.</li>
<li><strong>EMC Greenplum</strong> has been involved with Hadoop for several years, and is set to release a new distribution of Hadoop called Pivotal HD. <a href="http://readwrite.com/2013/03/12/proprietary-hadoop-is-a-losing-strategy">I've labeled Pivotal HD proprietary</a>, but EMC's Hadoop team has <a href="http://readwrite.com/2013/03/12/proprietary-hadoop-is-a-losing-strategy#comment-826955875">taken issue</a> with this characterization, arguing that PivotalHD is 100% open source, with complementary functionality (like HAWQ) available as add-ons. Point well taken, and I apologize for my misunderstanding. I was wrong, perhaps not surprisingly getting confused by&nbsp;<a href="http://www.greenplum.com/products/pivotal-hd">Pivotal HD's product page</a>, which&nbsp;says little about open source. But what seems clear is that customers won't be confused by EMC's value proposition: Hadoop with an advanced SQL query engine to make it easier and more powerful to use.</li>
<li><strong>Intel</strong> just got into the game with <a href="http://blogs.intel.com/technology/2013/02/big-data-buzz-intel-jumps-into-hadoop/">its own Hadoop distribution</a>. Basically, you can think of it as Hadoop on (Intel Xeon™ processor, Intel SSD, and Intel 10GbE networking.hardware) steroids.</li>
<li>For those who don't want to run Hadoop within the datacenter, Amazon offers <a href="http://aws.amazon.com/elasticmapreduce/">Amazon Elastic MapReduce</a> (EMR). As of April 2012, EMR was powering over <a href="http://servicesangle.com/blog/2012/04/27/amazon-web-services-1-million-hadoop-clusters-and-counting/">1 million Hadoop clusters</a>. Presumably this number is much bigger today.</li>
<li>Many, <a href="http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support">many others</a> including IBM BigInsights, a range of startups, and more.</li>
</ul>
<p>Will all of these companies make serious bank on Hadoop? No. Will some of them? Sure.</p>
<p>Ultimately, the winners in Hadoop will be those that invest most heavily in its success, as they will be perceived as the companies best positioned to help would-be customers succeed with Hadoop's complexities. But how they invest is up for discussion. Code to Apache Hadoop? Value-adding extensions?</p>
<p>Success isn't about open source purity, as <a href="http://blogs.gartner.com/merv-adrian/2013/03/09/open-source-purity-hadoop-and-market-realities/">Gartner's Merv Adrian posits</a>: it's about making customers successful. As we saw with Linux, where Red Hat is both the top contributor to the Linux kernel and the company that harvests the most revenue from distributing Linux, contributing code is a great way to signal to the market that you're a leader and capable of getting code fixes to support customers. Code matters.</p>
<p>But code contributions are not the only way to demonstrate leadership and attract customers. Ultimately, companies that make it easier to get value from Hadoop will win big. There may be more than one such company. Indeed, there almost certainly will be.&nbsp;</p>
<p>But there won't be 20 of them. Or even 10. Enterprise IT is simply not going to be able to manage a polyglot Hadoop distribution ecosystem. That's not the way markets work. No one wants to be <a href="http://searchengineland.com/figz/wp-content/seloads/2012/12/The-Long-Tail-The-Pile-of-Bodies.jpg">"long tail" vendor</a>, and customers don't want to buy from them, either, as Hugh MacLeod humorously points out on Gaping Void:</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/TheShortTail112%20copy.jpg" style="" />
				<span class="embedded-Media-image-caption">Source: GapingVoidArt. Used with permission.</span>
		</span>
</p>
<p>The Hadoop market over the next year is going to be hugely interesting. And bloody.</p>
<p><em>Image courtesy of&nbsp;<a style="line-height: 1.538em;" href="http://www.shutterstock.com/gallery-755863p1.html?cr=00&amp;pl=edit-00">Ehab Othman</a> / <a style="line-height: 1.538em;" href="http://www.shutterstock.com/?cr=00&amp;pl=edit-00">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/15/one-hadoop-to-rule-them-all</link>
                <guid>http://readwrite.com/2013/03/15/one-hadoop-to-rule-them-all</guid>
                <category>Hadoop</category>
                <pubDate>Fri, 15 Mar 2013 03:09:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[Proprietary Hadoop Is A Losing Strategy]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/Hadoop%20elephant.jpg" />
                                        <p>Hadoop, nearly synonymous with Big Data, has many failings. But open source is not one of them. In fact, Hadoop's open-source license remains one of its biggest draws, giving enterprises plenty of reasons to persevere in using it despite its shortcomings. It's therefore hard to see how EMC's new <a href="http://www.emc.com/about/news/press/2013/20130225-04.htm">Pivotal HD</a>, essentially a proprietary distribution of Hadoop, can hope to succeed.</p>
<p>Not that everyone agrees with this statement.</p>
<p>Dan Woods,&nbsp;CTO and editor of CITO Research and a contributor to&nbsp;<em>Forbes</em>, argues that embedding Hadoop into EMC Greenplum's massively parallel processing (MPP) database (HAWQ) offers CIOs and CTOs the simplicity they need to be successful with Hadoop. He has a point: Hadoop&nbsp;<em>is</em> complex and somewhat hard to use, which is why Cloudera CEO Mike Olson has <a href="http://www.theregister.co.uk/2012/06/14/hadoop_still_too_complex_for_enterprise_customers/">argued</a> that most of the world will experience the power of Hadoop through applications, nearly all of which will be proprietary, I might add.</p>
<p>But Olson's argument differs from Woods' argument in at at least one major way: Pivotal HD is enterprise infrastructure, not an application, and enterprise infrastructure is increasingly open source.</p>
<p>There are plenty of reasons for this, but RethinkDB's <a href="http://nosql.mypopescu.com/post/42017797886/how-to-plan-for-big-data-waterfall-vs-agile">Alex Popescu nails</a> one critical factor:</p>
<blockquote>
<p>Hadoop is so successful despite its complexity [because i]t allows experimenting and trying out new ideas, while continuing to accumulate and storing your data. It removes the pressure from the developers. That’s agility. It’s highly appreciated.</p>
</blockquote>
<p>In other words, a big reason for Hadoop's success is its open-source license, which permits a hefty amount of experimentation without having to get an enterprise license from EMC, Oracle, or any of the other incumbent infrastructure vendors. &nbsp;</p>
<p>EMC's Scott Yara tries to deflect criticism of its proprietary foray into Hadoop by declaring "We're all in on Hadoop, period," but as 451 Research analyst <a href="http://blogs.the451group.com/information_management/2013/03/11/all-in-on-hadoop/">Matt Aslett counters</a>, "I have no doubt that EMC Greenplum is 'all in' on Pivotal HD, but that’s not the same thing at all."</p>
<p>Take this away by building a <a href="http://www.cio.com/article/729451/EMC_Greenplum_Tackles_Big_Data_With_Hadoop_Distribution">proprietary Hadoop distribution</a>, and EMC has basically erased the very thing that made Hadoop workloads proliferate in the first place. EMC also cuts itself out of the standard adoption cycle for Hadoop, as Redmonk analyst <a href="http://redmonk.com/sogrady/2013/03/06/pivotal-hd/#ixzz2NFKCsdmZ">Stephen O'Grady suggests</a>, "Certainly there will be customers whose needs will dictate the adoption of a unique solution like Pivotal HD, but how many will that be relative to the segment whose adoption cycle begins with the download of one of the free Hadoop distributions?"</p>
<p>Today, Hadoop is one of the industry's hottest job trends. Even in absolute job numbers, it's about to pass EMC-related job posts:</p>
<div style="width: 540px;"><a title="Hadoop,emc Job Trends" href="http://www.indeed.com/jobtrends?q=Hadoop%2Cemc"> <img src="http://www.indeed.com/trendgraph/jobgraph.png?q=Hadoop%2Cemc" alt="Hadoop,emc Job Trends graph" width="540" height="300" border="0" /> </a>
<table style="font-size: 80%;" width="100%" border="0" cellspacing="0" cellpadding="6">
<tbody>
<tr>
<td><a href="http://www.indeed.com/jobtrends?q=Hadoop%2Cemc">Hadoop,emc Job Trends</a></td>
<td align="right"><a href="http://www.indeed.com/jobs?q=Hadoop">Hadoop jobs</a> - <a href="http://www.indeed.com/jobs?q=EMC">EMC jobs</a></td>
</tr>
</tbody>
</table>
</div>
<p>Enterprises aren't hiring for EMC's brand of Hadoop. They're hiring for the open source Hadoop. This matters.</p>
<p>Perhaps EMC feels that Hadoop's brand is big enough now that enterprises essentially understand it and are ready to move on from experimentation to full-scale adoption. In this EMC is likely to be disappointed. According to recent <a href="http://strataconf.com/strata2013/public/schedule/detail/27767">IBM survey data</a>, only 6% of enterprises have two or more Big Data projects underway (likely, though not explicitly, involving Hadoop in some way), and a mere 22% are running pilots to test the efficacy of their Big Data strategies. Everyone else is in full-on planning mode.</p>
<p>By creating a proprietary Hadoop distribution, EMC just dramatically limited its access to the 94% that are still in Big Data education and trial mode. Yes, it has a gargantuan sales force. No, they're simply not going to be able to reach would-be customers as efficiently as an open-source distribution model does.</p>
<p>But maybe EMC hasn't gone proprietary to more effectively monetize Hadoop interest, and instead sincerely believes, like Woods ("<a href="http://www.forbes.com/sites/danwoods/2013/02/27/why-sql-matters-the-limits-of-open-source-and-other-lessons-of-emc-greenplums-pivotal-hd/">open source development has its limits</a>"), that complex infrastructure problems are a poor match for open source. History has not been kind to such thinking, as Aslett sarcastically implies:</p>
<blockquote class="twitter-tweet">
<p>"Enterprise products" always prevail over open source. <a title="http://onforb.es/VgQ0Cq" href="http://t.co/zvCLZAMbtR">onforb.es/VgQ0Cq</a> That's why Linux has been such an abject failure versus Unix.</p>
— Matt Aslett (@maslett) <a href="https://twitter.com/maslett/status/307422911340355584">March 1, 2013</a></blockquote>
<p>EMC has seemingly bottomless resources to throw at Hadoop, and every incentive to do so. It's a smart, highly successful company and no doubt will prove successful with Pivotal HD. However, I can't see it ever dominating an open-source infrastructure market with a proprietary distribution. <a href="http://readwrite.com/2012/12/31/tech-jobs-in-2013-open-source-open-data">Open source is the foundation for today's most interesting markets</a>, from Big Data to mobile to cloud computing. It's unlikely that EMC will somehow stem this tide with a proprietary product, no matter its short-term performance or functionality advantages.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/12/proprietary-hadoop-is-a-losing-strategy</link>
                <guid>http://readwrite.com/2013/03/12/proprietary-hadoop-is-a-losing-strategy</guid>
                <category>Big data</category>
                <pubDate>Tue, 12 Mar 2013 09:30:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[Microsoft Completes Journey To Big Data Through Hadoop]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_bigdata.jpg" />
                                        <p>There's no beating around this bush. Today Hortonworks announced a new beta version of its Hadoop Data Platform that will <a href="http://hortonworks.com/about-us/news/hortonworks-brings-apache-hadoop-to-windows/" target="_blank">run on Microsoft Windows Server</a>, a move that shows Microsoft's own Big Data efforts will forever be connected to open source innovation.&nbsp;This is a highly significant – even expected – move in the big data sector, but also a very strange one.</p>
<p><a href="http://en.wikipedia.org/wiki/Apache_Hadoop" target="_blank">Hadoop</a>, of course, is an open-source software architecture that supports distributed computation jobs on huge data sets&nbsp;– in other words, classic Big Data work.&nbsp;Hortonworks, meanwhile, is one of the bigger Hadoop vendors in the market, even if that's more in terms of innovation than sales, where it trails Cloudera. Hortonworks founder and architect Arun Murthy is one of the original Hadoop coders who came out of Yahoo back in the day, and he also serves as the VP of the open source Apache Hadoop project at the Apache Software Foundation.</p>
<p>Which all means that any major platform move like this is sure to impact the rest of Hadoop development and, by extension, the rapidly growing Hadoop ecosystem that's driving much of the big data sector.</p>
<h2>Why Windows?</h2>
<p>Until today's announcement, Hadoop of any flavor typically ran on a Linux-based machine (physical or virtual). This made a lot of sense, since one of the big advantages of Hadoop is the capability to expand its data warehousing over any number of clustered computers. When those clustered machines are running Linux, it's all but frictionless to add more, both in in terms of licensing cost (which is free) and configuration (which is easy).</p>
<p>But when the underlying operating system is Windows Server, licensing&nbsp;– i.e., explicitly not free&nbsp;–&nbsp;would seem likely to create a lot more friction when someone tries to build a Hadoop cluster. Wouldn't using Windows Server as the OS for a Hadoop system be too expensive?</p>
<p>David McJannet, VP of marketing at Hortonworks, doesn't seem to think so. McJannet's concern was that too many Windows-based shops out there were shying away from Hadoop because they didn't want to deal with adding Linux clusters and the related hassle of managing them. So assuaging those concerns was one big reason Microsoft has been working with&nbsp;Hortonworks over the past 18 months.</p>
<p>The sheer number of Windows installations was also a major issue. McJannet said that a "majority of servers" were running Windows in the enterprise now. In its press release, Hortonworks cited IDC data thusly: "According to IDC, Windows Server owned 73 percent of the market in 2012 (IDC, <a style="line-height: 1.538em;" title="http://www.idc.com/getdoc.jsp?containerId=234339#.UStraKX7gqZ" href="http://www.idc.com/getdoc.jsp?containerId=234339#.UStraKX7gqZ">Worldwide and Regional Server 2012–2016 Forecast</a>, Doc # 234339, May 2012)."</p>
<p>It is not clear just what server class this 73 percent represents, since the report itself costs $4,500, and is thus a little hard to access. File servers? Application servers? It's sure not web servers, where <a title="http://news.netcraft.com/archives/2013/02/01/february-2013-web-server-survey.html" href="http://news.netcraft.com/archives/2013/02/01/february-2013-web-server-survey.html">according to Web analytics from Netcraft</a>, Microsoft currently has 16.93% of the marketshare, dwarfed by Apache's 55.26% marketshare.</p>
<p>McJannet also said Hadoop on Windows would make data exploration easier. Using SQL-based queries that can now directly integrate with the Hadoop Distributed File System (HDFS), products like SQL Server and Excel can tap straight into Hadoop-stored data, enabling end-users to more easily navigate vast stores of data in Hadoop clusters.</p>
<h2>Embracing Open Source</h2>
<p>This is not Hortonworks' first foray into Windows land. Late last year, it released the Windows Azure HDInsight product&nbsp;–&nbsp;essentially Hadoop for the Azure cloud platform.</p>
<p>As odd as it may seem to see Hadoop on Windows Server, the move makes a lot of sense from Microsoft's side. The company has needed a Big Data entry ever since it decided to drop its own Dryad data warehousing framework back in 2011. Some observers have expected this day ever since a year ago, when <a title="http://www.itworld.com/big-datahadoop/261056/microsoft-destined-follow-big-data" href="http://www.itworld.com/big-datahadoop/261056/microsoft-destined-follow-big-data">Microsoft announced it would build in tools within SQL Server to connect to Hadoop</a>.</p>
<p>McJannet emphasized that to date, Microsoft was playing well with others within the open source development model that Hadoop uses, so much of its innovation will cycle back to the rest of the Hadoop community.</p>
<p>If so, you can expect to see more Hadoop vendors to announce their own connections to Windows in the near future.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a><br /></em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/25/microsoft-completes-its-journey-to-hadoop</link>
                <guid>http://readwrite.com/2013/02/25/microsoft-completes-its-journey-to-hadoop</guid>
                <category>Big data</category>
                <pubDate>Mon, 25 Feb 2013 06:29:00 -0800</pubDate>
                <author>Brian Proffitt</author>
            </item>
                    <item>
                <title><![CDATA[Red Hat's Big Data Push: All Hat, No Cattle]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_93726238.jpg" />
                                        <p>Big Data is, well, big these days. (Maybe <a href="http://readwrite.com/2013/01/24/big-data-overhyped-and-overpaid" target="_blank">too big for its own good</a>.) So it should surprise exactly no one that&nbsp;Red Hat, itself a big open-source enterprise software and services company, wants to wear the Big Data hat, too. Which is exactly <a href="http://www.redhat.com/about/news/press-archive/2013/2/red-hat-unveils-big-data-and-open-hybrid-cloud-direction" target="_blank">what it announced</a> in an hour-long Webcast on Wednesday.</p>
<p>So, terrific. Red Hat is now a Big Data company&nbsp;–&nbsp;at least, that is, if you believe Red Hat. What does that mean, exactly? Surprisingly little.</p>
<p>Let's put it this way: Red Hat's big news was that it is open-sourcing its storage plug-in for <a href="http://en.wikipedia.org/wiki/Apache_Hadoop" target="_blank">Hadoop</a>, a popular open-source software framework that supports both distributed data and distributed applications. (Simpler explanation: Hadoop makes it possible to run data-intensive applications physically near the data itself, which greatly speeds things up because it's no longer necessary to shuttle great piles of data across a network.)</p>
<p>As a step toward transforming Red Hat's own storage file system into a "fully-supported, Hadoop-compatible file system for big data environments," as the company puts it, this is doubtless noteworthy. As something concrete on which enterprise customers can, uh, hang their hat, it leaves a lot to be filled in later.</p>
<p>That wouldn't be a problem if Red Hat had provided other noteworthy details, such as a road map for its development of what it calls an "open hybrid cloud." It's a nifty enough idea, essentially amounting to&nbsp;a data environment that would help business move their applications from in-house servers to those offered by cloud providers such as <a href="http://aws.amazon.com/" target="_blank">Amazon's Web Services unit</a>&nbsp;without having to rewrite them.</p>
<p>But while Ranga Rangachari, Red Hat's vice president of storage, talked up the open hybrid cloud and the "robust network" of partners Red Hat plans to work with to make it a reality, he had nothing to say about time frames or even the identities of its partners. "Just stay tuned as we come up with more definite dates and times and start&nbsp;– we'll absolutely make those partners available," Rangachari pleaded at the end of the event.</p>
<p>Maybe Red Hat is saving the details for the next roundup.</p>
<p><strong>(See also <a href="http://readwrite.com/2013/02/11/big-data-redhats-jim-whitehurst-looks-20-years-into-the-future" target="_blank">Big Data: Red Hat's Jim Whitehurst Looks 20 Years Into The Future</a>)</strong></p>
<p>&nbsp;</p>
<p><em>Image courtesy of&nbsp;<a href="http://www.shutterstock.com" target="_blank">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/21/red-hats-big-data-push-is-all-hat-no-cattle</link>
                <guid>http://readwrite.com/2013/02/21/red-hats-big-data-push-is-all-hat-no-cattle</guid>
                <category>Big data</category>
                <pubDate>Thu, 21 Feb 2013 05:31:00 -0800</pubDate>
                <author>David Hamilton</author>
            </item>
                    <item>
                <title><![CDATA[Big Data And The Landfills Of The Digital Enterprise]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_90381688.jpg" />
                                        <p>You do realize that you work for the only company on the planet that isn't leveraging Big Data? That everyone else is gaining competitive advantage by aggregating cash register receipts, the weather in Miami, your sister's Facebook posts and the average shelf-life of Lindt chocolate? That you, in fact, are the world's biggest Big Data failure?</p>
<p>While not generally expressed in this way, in my conversations with IT and line-of-business executives, such sentiment comes out in the subtext of what they say. The media (<a href="http://readwrite.com/2013/01/24/big-data-overhyped-and-overpaid">over</a>)hypes Big Data and IT executives then come to think they must be doing something wrong, as Gartner analyst <a href="http://blogs.gartner.com/svetlana-sicular/big-data-is-falling-into-the-trough-of-disillusionment/">Svetlana Sicular has found</a>&nbsp;in her conversations with clients. As the current thinking goes, if enterprises don't have warehouses overflowing with data, with data scientists madly crunching the data and coming up with "actionable insights," they're doing it wrong.</p>
<h2>One Big Landfill</h2>
<p>As just one example, I recently heard one executive at a Fortune 100 company say, "Hadoop is our unsupervised landfill."&nbsp;Spoken like a man who knows his data is important, but isn't quite sure why or how. So his company just stores everything in the hopes that all that data will one day make sense.</p>
<p>This is a reasonable response, given the pressures, but it's actually okay to not have The Big Data Answer. Odds are, your enterprise needs to figure things out over time, even without the mythical (and expensive) data scientists we keep reading about. <a href="http://blogs.gartner.com/svetlana-sicular/data-scientist-mystified/">Sicular argues</a> that "Organizations already have people who know their own data better than mystical data scientists." Give these in-house experts the <a href="http://media.dice.com/report/2013-2012-dice-salary-survey/">top tools for Big Data</a>, described in a recent Dice.com job trends report, <a href="http://readwrite.com/2012/12/31/tech-jobs-in-2013-open-source-open-data">all of which happen to be open source</a>, and let them iterate toward understanding the data.&nbsp;</p>
<p>Indeed, open source is the key here, not how big your data is. &nbsp;</p>
<h2>Exploration, Not Exploitation</h2>
<p>Alex Popescu nails it when <a href="http://nosql.mypopescu.com/post/42017797886/how-to-plan-for-big-data-waterfall-vs-agile">he posits</a>, "Hadoop is so successful despite its complexity [because i]t allows experimenting and trying out new ideas, while continuing to accumulate and storing your data." Unlike with proprietary technology, in open-source Big Data technology you don't have to sign any contracts, fork over any money, or do any of the things typically expected with enterprisee software vendors. You just download and explore.</p>
<p>This fact was underlined for me at a Big Data panel in Chicago this week, which featured Dr. Philip Shelley, CTO at Sears Holdings. Sears is arguably one of the industry's top pioneers when it comes to Big Data, and he insisted that open-source tools like Hadoop were critical to the company iterating its way to Big Data success. Things have gone so well that he has decommissioned millions of dollars in IBM Netezza and other proprietary technology to focus on Hadoop as its data hub. As he said, "We no longer have to budget for capital expenditures" for Big Data initiatives."</p>
<p>That's impressive.</p>
<p>Yes, <a href="http://readwrite.com/2011/07/26/big-data-by-sector-infographic">data volume is growing</a>. But that's cause for exploration and iteration, not frustration and despair, following Sears' example. You're not alone if you don't yet know what to do with all your data, or if you're wondering if you have enough to bother. As <a href="http://readwrite.com/2013/01/08/big-data-is-for-big-companies-and-other-bs">Brian Proffitt has pointed out</a>, small companies with less than gargantuan data troves can also benefit from Big Data technologies, because "big" isn't really about size at all. It's also about variety and velocity of data, among other things.</p>
<p>Or, as <a href="http://www.forbes.com/sites/edddumbill/2012/12/31/big-data-big-hype-big-deal/">Edd Dumbhill ably notes</a>, "'Big data' really means 'smart use of data'."</p>
<p>That "smart use" will almost always involve open source, as explained above. But it should also involve the understanding that you're not in a race to amass data and to recruit data scientists to decipher it. Big Data is an iterative process of using (mostly) open-source technologies to store and analyze data in different ways, learning from peers and from your own experience. It needn't be a landfill of buzzwords.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em>.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/11/big-data-and-the-landfills-of-our-digital-lives</link>
                <guid>http://readwrite.com/2013/02/11/big-data-and-the-landfills-of-our-digital-lives</guid>
                <category>Big data</category>
                <pubDate>Mon, 11 Feb 2013 07:27:50 -0800</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[Big Data: Overhyped And Overpaid?]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/BigDataArticle.jpg" />
                                        <p>Gartner research director Svetlana Sicular thinks Big Data is about to plummet off the "peak of inflated expectations" into the "trough of disillusionment." Perhaps. But other data from Twitter and job trends suggest a much more complicated picture.</p>
<p>Sicular reaches <a href="http://blogs.gartner.com/svetlana-sicular/big-data-is-falling-into-the-trough-of-disillusionment/">her conclusions</a> about Big Data based on a series of conversations with IT professionals over the past few weeks, in addition to a roundtable with Hadoop vendors Cloudera, Hortonworks, and MapR. In discussing Hadoop, the vendors suggest that "MapReduce has always been Hadoop’s bottleneck or that Hadoop is 'primitive and old-fashioned,'" apparently disillusioned with the state of Big Data's poster child/elephant.&nbsp;</p>
<p>This could be chalked up to the Hadoop vendors simply acknowledging that despite being an excellent technology, Hadoop still has a ways to go. But Sicular's conversations with enterprise business analysts are more damaging:</p>
<blockquote>My most advanced... Hadoop clients are also getting disillusioned. They do not realize that they are ahead of others and think that someone else is successful while they are struggling. These organizations have fascinating ideas, but they are disappointed with a difficulty of figuring out reliable solutions... Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field).</blockquote>
<p>And yet, these same companies don't seem to be giving up on Big Data.&nbsp;</p>
<p>For example, DataSift plowed through&nbsp;2.2 million Twitter mentions by more than 981,000 authors, as Ovum analyst <a href="http://ovum.com/2013/01/21/big-data-whats-hot-whats-not-according-to-the-twitter-stream/">Tony Baer reports</a>, finding that positive mentions of Big Data vendors outnumber negative mentions by 3-to-1. And while Baer acknowledges that "Twitter streams are not a scientific focus group for detecting brand awareness, they provide a valuable window on market thinking." Indeed, given the levels of Big Data hype, it's surprising that the overall mood about Big Data remains overwhelmingly positive.</p>
<p>So much so, in fact, that enterprises are paying a premium to hire job candidates with Big Data-relevant technology skills, as <a href="http://media.dice.com/report/2013-2012-dice-salary-survey/">Dice.com's 2012-2013 annual salary survey</a> reveals. Job candidates with Big Data technology expertise command an average salary of $100,000, while other hot technologies like cloud/virtualization ($90,000) and mobile ($80,000) yield lower salaries. As Alice Hill, managing director of Dice.com, asserts, "We’ve heard [Big Data] is a fad, heard it’s hyped and heard it’s fleeting, yet it’s clear that data professionals are in demand and well paid."</p>
<p>While Gartner clearly has a valid point that Big Data's outsized expectations are sure to crash into reality at some point, it's also clear from jobs data, in particular, that enterprises see enough value from their data that they're willing to pay up for expertise that can analyze it. Will they be disappointed? Possibly. But the jobs data indicates we have yet to plummet into Gartner's "trough of disillusionment."</p>
                    ]]></description>
                <link>http://readwrite.com/2013/01/24/big-data-overhyped-and-overpaid</link>
                <guid>http://readwrite.com/2013/01/24/big-data-overhyped-and-overpaid</guid>
                <category>Big data</category>
                <pubDate>Thu, 24 Jan 2013 05:30:00 -0800</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[Trickle-Down Web Innovation Breathes New Life Into Enterprise IT]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/files/files/2012-01-04-studioslackofinnovation-thumb.jpg" />
                                        <p>IBM spends <a href="http://ycharts.com/companies/IBM/r_and_d_expense">over $1.5B every quarter</a> in research and development (R&amp;D) expenses. SAP? Closer to <a href="http://ycharts.com/companies/SAP/r_and_d_expense">$700M</a>. Oracle? Assuming we don't count the tens of billions it spends buying other companies, its <a href="http://ycharts.com/companies/ORCL/r_and_d_expense">actual quarterly R&amp;D budget</a> comes in just over $1B - around <a href="http://ycharts.com/companies/HPQ/r_and_d_expense">$900M</a>. Microsoft, which has printed billions of dollars in profit each quarter for eons, spends more than them all, <a href="http://news.idg.no/cw/art.cfm?id=D9F9E9ED-B31B-91D3-30EB90CEA1D64447">topping $10B each year</a>.</p>
<p>And yet not one of these companies is responsible for the biggest advances in enterprise technology in the past decade. &nbsp;Cloud computing, Big Data, mobile... they're all being invented elsewhere, not by the enterprise behemoths.</p>
<p>Maybe they're doing it wrong?</p>
<h2>Lots Of R, Little D</h2>
<p>Take the cloud, for example. Microsoft claims to invest <a href="http://news.idg.no/cw/art.cfm?id=D9F9E9ED-B31B-91D3-30EB90CEA1D64447">90% of its R&amp;D budget on cloud computing</a>, but it is Amazon, Microsoft's penny-pinching, book-retailing neighbor, that sets the terms for innovation in cloud computing. Amazon launched EC2 back in 2006, when it was <a href="http://ycharts.com/companies/AMZN/r_and_d_expense">spending a measly $132M or so</a> each quarter on R&amp;D.&nbsp;</p>
<p>Even if we dismiss Amazon, where else are we seeing other cool advances in cloud computing? Netflix, for one, which just <a href="http://techblog.netflix.com/2013/01/janitor-monkey-keeping-cloud-tidy-and.html">released Janitor Monkey</a> to help Amazon Web Services (AWS) users dispose of their unused AWS resources, and the video company previously <a href="http://techblog.netflix.com/2012/07/chaos-monkey-released-into-wild.html">released Chaos Monkey</a>, which helps enterprises plan for and architect around cloud failure.</p>
<p>Notice the word I used? "Released." It means these tools were open sourced, not put out for sale. That's how innovation seems to happen in the 21st Century.</p>
<p>In large part innovation comes with an open-source license because it's a by-product of businesses that heavily rely on technology, but don't actually sell technology. &nbsp;It's 'trickle-down innovation' from the web business community.</p>
<p>VMware's <a href="http://bradhedlund.com/2011/08/19/distributed-systems-trickle-down-into-enterprise-it/">Brad Hedlund spotlighted this trend</a> back in 2011, when the enterprise awoke to discover it had problems that the web giants had already solved:</p>
<blockquote>As properties such as Yahoo!, Google, Facebook, Amazon became great successes, their architects and software engineers realized that they had moved mountains...The tremendous problems of efficiently running large scale applications on low cost infrastructure had been solved... At the very same time, enterprise IT begins to encounter some of the very same problems solved by the large web provider, such as scalable data warehousing and analytics (so called “Big Data”). Additionally, the software driven distributed systems that solve problems of infrastructure efficiency and management at very large scale could also be applied to infrastructure at a smaller enterprise IT scale (why not?). And finally, the cost savings of an application infrastructure designed to operate on low cost commodity hardware can be realized at any scale, large web or enterprise IT.</blockquote>
<h2>Filling The R&amp;D Gap</h2>
<p>Companies like Cloudera, DataStax and others stepped into this gap, taking the open source (or, in the case of some of Google's research, open knowledge) projects from the web and applying them to the enterprise in the form of Hadoop, Storm, NoSQL databases, etc. &nbsp;All of it developed at a comparative pittance to enterprise incumbents' R&amp;D budgets. &nbsp;All of it available free of charge on commodity hardware.&nbsp;</p>
<p>As an industry, we're richer for such open source innovation. &nbsp;Ironically, so are the enterprise IT vendors, who are investing tens to hundreds of millions of dollars in Hadoop and other open source data and cloud innovations, even as they continue to sink tens of billions of dollars into their homegrown R&amp;D. Maybe it's time for them to reevaluate how they do R&amp;D. &nbsp;Maybe, <a href="https://github.com/facebook">like Facebook</a>&nbsp;or <a href="http://twitter.github.com/">Twitter</a>, they should release their R&amp;D on GitHub as open-source code. &nbsp;At the least they could, <a href="http://research.google.com/pubs/papers.html">like Google</a>, centralize their research on the web, making it easily available to all.</p>
<p>Maybe, just maybe, they'd realize that it doesn't actually matter how much money a company spends on R&amp;D. &nbsp;What matters is whether it can execute and turn ideas into winning products, as <a href="http://news.yahoo.com/microsoft-huge-r-d-budget-useless-best-ideas-194130737.html">Brad Reed argues</a>, and whether it can help foster community around promising open source efforts. &nbsp;</p>
<p>This is what new enterprise IT - Facebook, Twitter, Google and Yahoo - demonstrates. &nbsp;It remains to be seen if IBM, Microsoft and other traditional IT vendors are paying attention. &nbsp;Unless they do, they'll lose relevance as a new breed of innovative startups emerge to claim the strategic largesse of CIOs' budgets.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/01/07/trickle-down-web-innovation-breathing-new-life-into-enterprise-it</link>
                <guid>http://readwrite.com/2013/01/07/trickle-down-web-innovation-breathing-new-life-into-enterprise-it</guid>
                <category>enterprise IT</category>
                <pubDate>Mon, 07 Jan 2013 08:00:00 -0800</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[6 Strategies For Cracking The Enterprise Tech Market In 2013]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_122868205_wall.jpg" />
                                        <p>With all the recent teeth gnashing about startup investment shifting from consumer to enterprise technology, it's worth noting that successfully cracking the enterprise market is no easy task:</p>
<ul>
<li>70% of the U.S. economy hinges on consumer spending. Even with the pending fiscal cliff, it's kind of hard to ignore the numbers.&nbsp;</li>
<li>Enterprise technology is not a <a href="http://www.golf.com/instruction/short-game" target="_blank">short game</a>.</li>
</ul>
<p>Unlike most consumer technologies, enterprise infrastructure and applications run on a much longer upgrade cycle: 5-7 years. While you might ditch your smartphone every year or two for a newer model, few companies are willing to swap out their CRM systems, storage or security technologies that quickly.</p>
<p><a href="http://pdf.aminer.org/000/326/425/information_technology_innovation_and_competition_in_the_presence_of_switching.pdf" target="_blank">Switching behavior</a> is both the most complicated and important subject in the enterprise technology market. Even if enterprise customers have good reasons to be unhappy with their technology vendors (e.g., lack of innovation, price gouging, poor support), their business <em>runs </em>on that technology. This makes them highly incentivized to see existing vendors address any issues and continue the relationship. As we all know, moving's a bitch.</p>
<p>Of course, enterprise tech is a rich, rewarding game, so it's worth exploring the strategies startups can use to overcome the barriers to switching in the enterprise market:</p>
<p><strong>1. Transformational Technologies.</strong> The ultimate startup is the one that changes the game on an incumbent in such a way that the latter neither can block nor retaliate. Classic examples include <a href="http://readwrite.com/search?keyword=virtualization" target="_blank">Virtualization </a>and <a href="http://readwrite.com/tag/saas" target="_blank">Software-as-a-Service</a> (SaaS). Because virtualization decouples compute functions from hardware (while running on top of the hardware), it is the ultimate disruptor because it's non-invasive. SaaS eliminates the stickiness of packaged software - and the lucrative support contracts that go along with it. Interestingly, while there tend to be many attackers in Virtualization and SaaS, only a few players tend to win big. Very big: witness VMware and Salesforce.</p>
<p><strong>2. Changing Product Cycles.</strong> Catching technology giants in product transition cycles is one of the most effective ways to insert new technologies. However, this usually requires an outside force to speed insertion. Earlier in my career, <a href="http://en.wikipedia.org/wiki/Centrino" target="_blank">Intel Centrino</a> drove the need for enterprise Wi-Fi and forced an architectural change. In 2013 you can see many great examples of this idea, including <a href="http://www.paloaltonetworks.com/" target="_blank">Palo Alto Networks</a>, <a href="http://www.splunk.com/" target="_blank">Splunk</a>, <a href="http://www.servicenow.com/" target="_blank">ServiceNow </a>and <a href="http://www.workday.com/" target="_blank">Workday</a>. These transition cycles don't last forever, though. Over time the incumbents typically build or buy their way into the new product segment and the situation stabilizes until a new cycle begins.</p>
<p><strong>3. Trojan Horses.</strong> Sometimes a new enterprise IT category emerges in an indirect way. Cloud infrastructure eliminates the need to buy IT hardware and software; the rental model emerged as form of shadow IT for specific projects that could not wait for corporate IT to respond. It also became the preferred approach for brand new businesses (Netflix streaming). <a href="http://aws.amazon.com" target="_blank">Amazon Web Services</a> and <a href="http://www.rackspace.com" target="_blank">Rackspace</a>, two big early winners in cloud computing, sell computing cycles by the month, payable with with a credit card - often bypassing traditional IT purchasing processes. Once established, Cloud and SaaS vendors can then turn their attention to selling to mainstream IT.</p>
<p><strong>4. New Buying Centers.</strong> The multi-hundred billion-dollar enterprise IT game now pivots on competition for the IT "stack," as we shift from the Client-Server/Web mobel to cloud computing. This change has created a new class of IT decision makers such as the "<a href="http://www.wired.com/wiredenterprise/2012/12/cloud-architect/" target="_blank">cloud architect</a>." As companies move more to the cloud, this new IT leadership category drives key decisions for enabling new applications, also driving the buying all of the underlying IT components. And these new buyers may not be as wedded to the incumbent suppliers as were the decision makers they supplant.</p>
<p><strong>5. The Consumerization of IT.</strong> The iPhone led to a watershed change both in enterprise mobility and computing. Not only did it challenge corporate purchasing patterns ("I buy, you enable," also known as BYOD, or Bring Your Own Device), it eliminated a final barrier to what constituted a business device. This is less about "consumerizing" enterprise IT, but rather, adapting enterprise IT to leverage consumer technologies. In addition to mobile <em>devices</em>, apps are challenging the application market for business software.</p>
<p><strong>6. Coalitions of the Willing.</strong> For most small companies, hiring a large enterprise sales force and entering a year-long acquisition cycle is likely to be an expensive exercise in futility. Sure, you might be able to make a living selling to universities, hospitals and niche verticals, but attacking the Fortune 500 requires friends who need another reason to re-engage in a selling conversation. Manufacturing and strategic partnerships with hardware makers made a lot security companies rich during the client-server era (e.g., McAfee, Symantec). Today, companies like <a href="http://www.box.com/platform" target="_blank">Box </a>are changing the game through new kinds of partnership integrations.</p>
<p>Frontal assaults are the hardest attack strategy for an enterprise startup. Attacking a powerful technology company's profit sanctuary tends to piss them off. If you can pull it off, it might just get your company acquired, but run a big risk of perishing in the attempt.</p>
<p>That's why this tends to be the strategy of large companies (e.g., HP's acquisition of 3Com to attack Cisco) and does not have a great track record. The assault on the business PC by iOS and Android tablets and smartphones may turn out be a more successful example, but, Apple and Google and Samsung are hardly startups.</p>
<p>It can be done, of course. Many decades ago, Microsoft's PC operating system was such a technology and for a generation, a small company in Redmond changed the world. (With a big initial boost from IBM, of course.)</p>
<p>Current technologies that might have the power to force enterprises to switch and create hugely successful startups include Apache Hadoop, Network Virtualization, Flash Storage, and Cloud Storage and Collaboration. That's where I'd look for the next big thing.</p>
<p>&nbsp;</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com" target="_blank">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/01/02/6-strategies-for-cracking-the-enterprise-tech-market-in-2013</link>
                <guid>http://readwrite.com/2013/01/02/6-strategies-for-cracking-the-enterprise-tech-market-in-2013</guid>
                <category>Startups</category>
                <pubDate>Wed, 02 Jan 2013 06:00:00 -0800</pubDate>
                <author>Alan S Cohen</author>
            </item>
            </channel>
</rss>

