<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
        <channel>
        <title>big-data - ReadWrite</title>
        <link>http://readwrite.com</link>
        <description />
        <language>en</language>
        <copyright>Copyright 2012 SAY Media, Inc.</copyright>
        <managingEditor>readwriteweb@gmail.com</managingEditor>
        <docs>http://blogs.law.harvard.edu/tech/rss</docs> 
        <lastBuildDate>Fri, 17 May 2013 07:07:00 -0700</lastBuildDate>
        <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://rww.superfeedr.com/" />

                    <item>
                <title><![CDATA[Google Sensors Are Data Mining I/O Attendees - And They Don't Care]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/IMAG0625.jpg" />
                                        <p>If you're visiting the <a href="http://readwrite.com/tag/Google+IO13/" target="_blank">Google I/O developers conference</a> this week, you're a tiny part of a giant Google experiment to sniff out everything from your body heat to your breath. Google is even listening to your footfalls as part of its <a href="http://data-sensing-lab.appspot.com/" target="_blank">Data Sensing Lab I/O 2013</a>.</p>
<p>Think that's a scary, Big-Brother invasion of privacy? The conference attendees I talked to didn't seem to mind. In fact, one wanted Google to collect even more data.</p>
<p>Google planted 525 powered sensors around the halls of <a href="http://www.moscone.com/site/do/index" target="_blank">San Francisco's Moscone Convention Center</a>, and began collecting data from them on Wednesday, according to&nbsp;Michael Manoochehri, a developer programs engineer at Google. The company began measuring temperature, humidity, light, pressure (including nearby footfalls), motion, air quality and both RF and ambient noise. All of the data is sent back at intervals of 20 seconds or so, collected by Google's <a href="https://accounts.google.com/ServiceLogin?service=ah&amp;passive=true&amp;continue=https://appengine.google.com/_ah/conflogin%3Fcontinue%3Dhttps://appengine.google.com/&amp;ltmpl=ae" target="_blank">App Engine</a>, with analysis performed by its <a href="https://developers.google.com/bigquery/" target="_blank">BigQuery Big Data analysis tool</a>. You can see the results at the Lab's&nbsp;<a href="http://data-sensing-lab.appspot.com/." target="_blank">dedicated Web site</a>.&nbsp;</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/IMAG0622.jpg" style="" />
			</span>
</p>
<p>Among other things,&nbsp;<a href="http://readwrite.com/2013/05/15/googles-cloud-gets-smart-new-photos-search-and-maps" target="_self">Google's I/O developer conference has focused</a>&nbsp;this year on improving developer tools and better integrating the services that it already owns via a more intelligent cloud. The unnamed sensor project, part of Google's Data Sensing Lab, encompasses a bit of all of that. By itself, knowing that the air quality diminished at 4a.m. might be intriguing, but not all that significant. But by correlating that information with a peak in another data stream - ambient noise, say - it becomes possible to guess what's going oin; in this case, perhaps, the arrival of the cleaning crew.</p>
<p>Manoochehri said that Google could build in queries against the sensor network into its Google I/O app, to identify the quietest spots on the floor for a phone call or a brief nap.</p>
<h2>Crossing The Creepy Line?</h2>
<p>Eric Schmidt, then the chief executive of Google, famously described <a href="http://blogs.wsj.com/digits/2011/01/21/top-10-the-quotable-eric-schmidt/" target="_blank">Google's policy</a> as "to get right up to the creepy line, but not cross it." When Google unified its privacy policy in March 2012, the company suggested that its unified services could anticipate an afternoon meeting and direct you to leave at a certain time. A year ago, <a href="http://readwrite.com/2012/06/29/google-now-knows-more-about-you-than-your-family-does-are-you-ok-with-that#" target="_blank">that notion prompted righteous outrage</a> from members of Congress, users and privacy advocates. A year later, that feature (now called Google Now) has been lauded as the herald of <a href="http://readwrite.com/2013/05/15/google-search-anticipatory-system-io13" target="_self">anticipatory search</a>. (Six privacy advocates from the EU are still threatening action.)</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/Serenity_of_each_Room.png" style="" />
				<span class="embedded-Media-image-caption">Source: Google</span>
		</span>
</p>
<p>It's probably fair to say that attendees of Google I/O give Google a bit more leeway than the general public. That certainly proved to be the case for those sitting near the sensors. Alan Holzman, a retired venture capitalist who last worked for Intel Capital, shrugged it off. "My life is tied to Google in much more significant ways," he noted.</p>
<p>Ditto for Sam Napolitano, who was covering Google I/O for the <em>Huffington Post</em>. Napolitano said he believed that the sensors were probably picking up on the NFC tag embedded within his name tag - something that Google employees said wasn't true. In any event, Napolitano said, he didn't care, as he had no expectations of privacy in a public space.&nbsp;"As long as it's not under my toilet seat, I don't care," Napolitano said of the sensors.</p>
<p>And "Rachid," an employee of Motorola Mobility who declined to give his last name,&nbsp;said he wanted to Google sample more data. More data and more correlation often derives more interesting results, he said, such as the various causes of depression.&nbsp;</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/MWMAIN_JLOUIE%20LowRes.jpg" style="" />
			</span>
</p>
<h2>The Internet Of Things</h2>
<p>Collecting data from sensors is increasingly seen as part of the rise of the so-called <a href="http://readwrite.com/tag/Internet+of+Things/" target="_blank">Internet of Things</a>, and Google clearly wants to be a leader in this growing domain.&nbsp;Google already collects some location data via Android phones to better improve its knowledge of traffic, and provide better solutions via Google Maps.&nbsp;</p>
<p><strong>(See also <a href="http://readwrite.com/2013/04/26/how-the-internet-of-things-will-revolutionize-search" target="_blank">How The Internet Of Things Will Revolutionize Search</a>.)</strong></p>
<p>We know that Google is very good at parsing user data - pulling keywords from emails, for example, and selling ads against them. (Selling ads against search terms is child's play.) Likewise, it can make recommendations for where to eat, where to go, the route to take and when to leave - building more comprehensive, personalized and valuable profiles along the way.</p>
<p>But the I/O conference project suggests that Google is prepared to take the same value proposition - collect data, analyze it, and provide and sell services against it - far beyond today's core businesses. Imagine sensors placed on Google Street View cars, and selling a comprehensive snapshot of air quality to the communities it maps. Or mounting similar sensors on the light poles from which it strings &nbsp;it Google Fiber broadband connections.</p>
<p>It will be interesting to see how far Google takes this. Remember this is the company that attempted to track the spread of <a href="http://www.google.org/flutrends/us/" target="_blank">influenza via search terms</a>. Google said that it wants attendees and other users to be able to interact with its new sensor data via the project's&nbsp;<a href="http://data-sensing-lab.appspot.com/." target="_blank">website</a>. How soon will it be when we'll be able to do the same for, say, San Francisco?</p>
                    ]]></description>
                <link>http://readwrite.com/2013/05/17/google-sensors-data-mining-i-o-attendees</link>
                <guid>http://readwrite.com/2013/05/17/google-sensors-data-mining-i-o-attendees</guid>
                <category>Google IO13</category>
                <pubDate>Fri, 17 May 2013 07:07:00 -0700</pubDate>
                <author>Mark Hachman</author>
            </item>
                    <item>
                <title><![CDATA[What Are The Feds Hiding? Let's Ask The Declassification Engine]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/RWW%20CIA%20report.png" />
                                        <p class="p1">Each year, the U.S. government declassifies thousands of documents and releases them to the public through collections like the Declassified Document Reference System (DDRS) and the CIA's <a href="http://www.foia.cia.gov/">FOIA Reading Room</a>. Some, however, contain "redacted" information that's too sensitive to be released — leaving, for instance, key details of an FBI memo blacked out for the average reader.</p>
<p class="p1">Enter the <a href="http://www.declassification-engine.org/">Declassification Engine</a>, which aims to harness Big Data analysis and some old-fashioned crowdsourcing to peer through the "black bars" of redacted documents and reveal what the government doesn't want you to know.&nbsp;</p>
<p class="p1">Using publicly available, declassified documents as its sources, the Declassification Engine aims to eventually make informed guesses about what those black bars are hiding, providing a "word cloud" of likely possibilities. Is that blacked out word "Aurora," for example, potentially referring to new types of advanced aircraft? And, if so, does that imply that similar redacted memos refer to the same key words?&nbsp;</p>
<h2>A Tool For Historians And The Public</h2>
<p class="p1">The Declassification Engine could be an instrument for historians and conspiracy theorists alike.&nbsp;For now, though, it's basically just a set of data-analysis tools developed by researchers at Columbia University.</p>
<p class="p1">One finds&nbsp;correlations between specific words and often-classified memos, for example. Another was designed to help train the system to pick up on differences between redacted documents, and what was revealed years later when the government declassified them for public eyes. Eventually, they'll form a more cohesive whole, the Engine's creators say.</p>
<p class="p1">To take the next steps, the Engine's founders are asking for help.&nbsp;Last week, historians, journalists, legal scholars, statisticians, and computer scientists met at Columbia University to formally launch the Engine — and to ask for money. The Declassification Engine <a style="line-height: 1.538em;" href="http://www.indiegogo.com/projects/the-declassification-engine-saving-history-from-official-secrecy">hopes to raise $50,000 to fund the project</a>, and its founders have only raised a few hundred dollars at present.</p>
<p class="p1">Matthew Connelly, a historian at Columbia and one of the creators of the Declassification Engine, explained that the group is consciously trying to put the Declassification Engine on the "white hat" side of the fence — the opposite side, in other words, from organizations like Wikileaks.</p>
<p class="p1">The Engine's source material consists of documents that have already been declassified and released by the government for public scrutiny. Furthermore, its users aren't "cracking" redactions; they're simply making guesses. What they hope are <em>good</em> guesses, but guesses nevertheless.</p>
<h2 class="p1">How The Engine Revved Up</h2>
<p class="p1">Declassification straddles a long-standing fault line in American politics, as&nbsp;Marc Trachtenberg, a professor of political science at UCLA&nbsp;<a style="line-height: 1.538em;" href="http://www.sscnet.ucla.edu/polisci/faculty/trachtenberg/documents/doclist.html">explains</a>:</p>
<blockquote>
<p class="p1">There is thus a built-in conflict between the consumer and the supplier of historical evidence: we historians want to see the 'dirt,' but those responsible for the release of documents want to make sure that the material released does not damage the political interests they are responsible for protecting.</p>
</blockquote>
<p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/RWW%20UFO%20memo%20-%20Edited.png" style="" />
				<span class="embedded-Media-image-caption">A redacted memo, made public by the Freedom of Information Act. (Source: http://www.foia.cia.gov/)</span>
		</span>
</p>
<p class="p1">Declassified documents are often a tool to better understand our own history. But getting at that understanding sometimes requires teasing out decades-old data.</p>
<p>One of the first things the team did last year was to analyze which keywords were most closely associated with federal decisions to withhold documents among 1.4 million State Department cables. They then created a tool to analyze diplomatic activity over time depending on which terms were used, and the likelihood that a cable that included a specific term would still be classified.</p>
<p class="p1">That analysis revealed that 1970s cables that contained the word "Boulder" or phrase "Operation Boulder" were much, much more likely to be withheld, Connelly said. As it turned out, <a href="http://declassifiedboulder.wordpress.com/" target="_blank">Project Boulder</a>&nbsp;was President Nixon's plan, hatched&nbsp;following the hostage crisis at the Munich Olympics,&nbsp;to increase FBI scrutiny of Arabs entering the United States. In other words, 1970s-style ethnic profiling.</p>
<p class="p1">In this case, Connelly said, the archive of scanned documents could have served as a historical context when people began discussing the treatment of Arab-Americans thirty years later, after Sept. 11. But without the digital archive of source documents, that context wasn't readily available.</p>
<p class="p1">"The reason that these historians have never even heard of it is because the vast majority of the documents have been withheld, in the archives," Connelly said. "Without those documents, we can't even begin to try and derive some of these lessons."</p>
<p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/RWW%20Boulder%20withdrawn%20memos.png" style="" />
				<span class="embedded-Media-image-caption">The government originally held back the majority of memos that contained the word &quot;Boulder&quot;. (Source: Matthew Connelly)</span>
		</span>
</p>
<h2 class="p1">Is It Legal?</h2>
<p class="p1">Given the political climate surrounding security in the decade-plus since September 11, the Declassification Engine's creators said last week that they were somewhat nervous that the U.S. government might try to clamp down on it. (The creators, naturally, believe that it's perfectly legal.) Connelly, however, said that the discussion during Friday's conference gave him reason to believe that the Engine's creators aren't likely to face any investigation from law enforcement agencies.&nbsp;</p>
<p class="p1">Nevertheless, on Friday, the<a style="line-height: 1.538em;" href="http://declassification-engine.org/index.py?section=faq"> FAQ portion of the site</a> was modified to eliminate all references to the project's legality, including that the group sought input from the State Department and the National Archives to better understand the declassification process.</p>
<p class="p1">"In some cases, we are using statistical methods to predict what is still classified," the Declassification Engine's FAQ said Thursday night.</p>
<h2>How The Tools Work</h2>
<p class="p1">Connelly gave ReadWrite an early glimpse of one component of the Engine on Thursday night. That's the Redaction Visualizer, which compares redacted and unredacted documents and highlights the differences. On the surface, this seems pretty obvious.&nbsp;</p>
<p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/RW%20Vietnam%20image_0.png" style="" />
				<span class="embedded-Media-image-caption">Comparing an unredacted and redacted memo. (Source: Declassification-engine.org)</span>
		</span>
</p>
<p class="p1">But the Visualizer is also the basic equivalent of your math homework: the redacted document provides the problem to solve, and the unredacted document is the "answer". This supervised data will &nbsp;"teach the computer to teach itself about what's in the redaction," Connelly said.</p>
<p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/Vietnam%20text%20-%20Edited.png" style="" />
				<span class="embedded-Media-image-caption">The text that the Redaction Visualizer pulls out. (Source: Declassification-engine.org)</span>
		</span>
</p>
<p class="p1">The real work for the Engine, though, lies in deciphering the redactions themselves. And the biggest arrow in its quiver is context. In total, the Engine uses 117,509 documents from the DDRS, with the most from the Eisenhower and Johnson administrations.</p>
<p class="p1">The text of the documents themselves are just one part of the puzzle. But there's a surprising amount &nbsp;of unredacted metadata attached to each as well: the date, the author, the subject, who classified it, when it was declassified — 68 fields in all, Connelly said. All can be used as clues to make guesses as to what the redacted content contains. Connelly admits that he's not even clear on how well the Engine could work, once it's up and running.</p>
<p class="p1">What the Declassification Engine hopes to do for each redaction is generate a "word cloud" of the words that are statistically likely to be hidden by the redaction. Granted, this is a lot easier to do with a short series of letters, such as a name or date. Still, any guesses could be used to tease out further possibilities, and cross-correlated with other, similar documents to make further guesses.</p>
<p class="p1">Eventually, the Declassification Engine could become a Web site, where users could upload their own declassified documents, run them against the tools, and also add their own insights.&nbsp;"It would create a virtuous circle, and [users] would be able to make more and more powerful and accurate predictions," Connelly said.</p>
<h2 class="p2">Obama Turbocharges The Engine</h2>
<p class="p1">The Declassification Engine received an unexpected boon from the Obama Administration on the eve of its launch: an <a href="http://www.whitehouse.gov/the-press-office/2013/05/09/executive-order-making-open-and-machine-readable-new-default-government-">executive order</a> making machine-readable government documents the law of the land.</p>
<p class="p1">"Government information shall be managed as an asset throughout its life cycle to promote interoperability and openness, and, wherever possible and legally permissible, to ensure that data are released to the public in ways that make the data easy to find, accessible, and usable," President Obama wrote. "In making this the new default state, executive departments and agencies shall ensure that they safeguard individual privacy, confidentiality, and national security."</p>
<p class="p1">The order could remove the need to optically scan some government documents, allowing the Engine to more quickly process bunches of files.&nbsp;It remains to be seen how executive agencies will protect their electronic documents, however.&nbsp;</p>
<p class="p1">But, as Connelly noted, the order begs the question: if machines are now allowed to read government documents, shouldn't they be allowed to guess what they're hiding?</p>
                    ]]></description>
                <link>http://readwrite.com/2013/05/14/what-are-the-feds-hiding-lets-ask-the-declassification-engine</link>
                <guid>http://readwrite.com/2013/05/14/what-are-the-feds-hiding-lets-ask-the-declassification-engine</guid>
                <category>Big data</category>
                <pubDate>Tue, 14 May 2013 07:00:00 -0700</pubDate>
                <author>Mark Hachman</author>
            </item>
                    <item>
                <title><![CDATA[Big Data May Be A Pretty Small Problem]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_smalldata.jpg" />
                                        <p>The idea that a business needs data analysis to better make business decisions is not in dispute… but there is currently a strong debate on how big a data set a business actually needs and how much they need to spend to get that data.</p>
<p>The lure of big data is a powerful one… your web site is flooded with tracking and logging data, after all, and if you only had the tools to store and analyze that data, you could learn the secrets of making your business successful, discover the Colonel's secret recipe and figure out the Question that goes with the Answer 42.</p>
<p>Well, maybe not that much detail, but with the level of hype around big data, one sometimes wonders.</p>
<p>One standard approach to analyzing this data is the installation and configuration of Hadoop servers that are grouped together in clusters of machines - either physical or virtual. Hadoop clusters use distributed storage that makes it relatively simple to store a lot of data fast with less pain than relational database configuration. They also use Java-based MapReduce software to reach into that data and scoop out what you really want - golden nuggets of pure information.</p>
<p>There are limits to MapReduce, naturally: it doesn't perform analysis in real time, but rather in occasionally time-consuming batches, and setting up MapReduce software to do exactly what you need has been compared to getting a root canal. This is why there is an entire ecosystem around Hadoop dedicated to working around those shortcomings, introducing real-time analysis, structured database tool, and software that can convert existing database queries written in Structured Query Language (SQL) to something MapReduce can handle.</p>
<p>But even though Hadoop is relatively inexpensive and easy to scale out onto many machines that run the Linux operating system, is this approach the equivalent of using a wrecking ball to knock down a dollhouse?</p>
<h2>Too Much Data?</h2>
<p>Some would argue that is indeed the case. A January 2013 paper from Microsoft Research, for instance, disputes the notion that most data analysis that a business would even need a Hadoop cluster, but instead could use a more powerful single server that is scaled-up.</p>
<p>According to the authors of "<a title="http://research.microsoft.com/pubs/179615/msrtr-2013-2.pdf" href="http://research.microsoft.com/pubs/179615/msrtr-2013-2.pdf">Nobody Ever Got Fired For Buying a Cluster</a>," the data set sizes of many given businesses are not typically large enough to warrant scaled-out clusters of multiple computers.</p>
<p>You would expect that to be the case for small- to medium-sized businesses (SMBs), but it's also true for enterprises. Even the mega-companies for which big data tools were practically invented don't need those tool a large majority of the time.</p>
<p>For example, the authors found, an analysis of 174,000 jobs submitted to a production analytics cluster in Microsoft had a median job input data set size of less than 14 GB, and 80% of jobs had an input size of less than 1 TB.</p>
<p>The paper cites another study from K. Elmeleegy that "analyzes the Hadoop jobs run on the production clusters at Yahoo. Unfortunately, the median input data set size is not given but, from the information in the paper we can estimate that the median job input size is less than 12.5 GB."</p>
<p>And Yahoo, by the way, is where much of the core functionality of Hadoop was developed, built on the distributed filesystem research conducted earlier at Google. If they aren't using Hadoop for mega jobs all of the time, how appropriate is Hadoop for a "normal" enterprise's data sets?</p>
<p>Facebook, the Borg-like consumer of all user data, surely needs the big data tools, right?</p>
<p>"Ananthanarayanan et al. show that Facebook jobs follow a power-law distribution with small jobs dominating; from their graphs it appears that at least 90% of the jobs have input sizes under 100 GB," the paper states. "Chen et al. present a detailed study of Hadoop workloads for Facebook as well as 5 Cloudera customers. Their graphs also show that a very small minority of jobs achieves terabyte scale or larger and the paper claims explicitly that 'most jobs have input, shuffle, and output sizes in the MB to GB range.'"</p>
<h2>Most Data Is Small</h2>
<p>The conclusions of the paper, which analyzes various configurations of Hadoop jobs in clustered computers, both physical and in the cloud, against a single scaled-up Hadoop cluster, found that for a majority of data analysis work, the scaled-up server not only handled the workload well, it actually outperformed the clustered machines in many respects.</p>
<p>Now, like any scientific paper, particularly one from a commercial vendor, some skepticism must be applied. Here, the conclusions would seem to benefit Microsoft's sales model for pushing data analysis tools into the enterprise and even SMBs. Scaled-out Hadoop clusters on Linux, after all, are pretty cheap compared to comparable Windows Server clusters, but even the least expensive Hadoop cluster can't hold a candle to the low price of a single scaled-up server.</p>
<p>Which may be the point of the paper, so take it as you will.</p>
<p>Still, there seems to be compelling evidence from sources other than Microsoft that there is a vast majority of data analysis jobs that do not need much more than a strong server or even a personal computer to crunch the numbers and get those golden nuggets of information.</p>
<p>This is not to say that every data problem can be solved with an Excel spreadsheet and a laptop. The flexibility of non-relational (NoSQL) databases are still a very attractive solution to storing and analyzing data sets. And Hadoop is still a relatively inexpensive way to store a lot of data until such time you need to massage it and discover the secrets of the universe or at least your third-quarter sales.</p>
<p>(See also <a title="http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think" href="http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think">Hadoop Adoption Accelerates, But Not For Data Analytics</a>.)</p>
<p>Before beginning an exploration into the world of big data, businesses should be careful on separating hype from reality and making sure they don't overkill their data needs with a solution that will be more costly to set up and operate in the long run.</p>
<p>Look at NoSQL databases as a way to hold and analyze data for lower costs than relational SQL databases. Or look at <a title="http://readwrite.com/2012/09/26/big-data-effective-beyond-the-enterprise" href="http://readwrite.com/2012/09/26/big-data-effective-beyond-the-enterprise">federated data services that can provide key information aggregated within your particular sector</a>. And even look at the data you have and start playing around with it in a spreadsheet sometime and see what you come up with.</p>
<p>Hadoop is one way to work with data, but it is by far not the only way.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/05/13/big-data-may-be-a-small-problem</link>
                <guid>http://readwrite.com/2013/05/13/big-data-may-be-a-small-problem</guid>
                <category>Big data</category>
                <pubDate>Mon, 13 May 2013 05:30:00 -0700</pubDate>
                <author>Brian Proffitt</author>
            </item>
                    <item>
                <title><![CDATA[Hadoop Adoption Accelerates, But Not For Data Analytics]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_136899149.jpg" />
                                        <p>The Hadoop market is on a tear, growing at a compound annual growth rate of roughly 60%, <a href="http://www.idc.com/getdoc.jsp?containerId=prUS23471212">according to IDC</a>. But why it's growing, or rather, how it's being used, might surprise you. Given all the media hype around Hadoop and its power to predict everything from the optimal number of raisins in your cereal (23) to the exact date of Armageddon (next Tuesday - call in sick), it's perhaps surprising to learn that comparatively few organizations use Hadoop for analytics. Today most enterprises use Hadoop for the pedestrian uses of storage and ETL (Extract, Transform, Load).</p>
<p>Eventually enterprises get to sexy analytics. But we're not there yet. Not by a long shot.</p>
<h3>'Poor Man's ETL', 'Unsupervised Digital Landfill', Or Both?</h3>
<p>While commonly billed as an analytics tool, Hadoop remains "a poor man's ETL" for the vast majority of enterprises. Yes, there are enterprises running interesting analytical workloads on Hadoop, but these are the exception, not the rule. Hence, while <a href="http://blog.cloudera.com/blog/2013/02/big-datas-new-use-cases-transformation-active-archive-and-exploration/">Cloudera cites</a> three common use cases for Hadoop (data transformation, archiving, and exploration, I'm hearing from analysts that 75% or more of the actual Hadoop adoption resides in those first two use cases.</p>
<p>Which is not to suggest such adoption is valueless. Quite the contrary.</p>
<h3>The Common Adoption Path For Hadoop</h3>
<p>As 451 Research analyst <a href="http://www.slideshare.net/Hadoop_Summit/what-is-the-point-of-hadoop">Matt Aslett highlighted at Hadoop Summit</a>, there is a natural progression from using Hadoop to store large quantities of data (i.e., Hadoop as an "<a href="http://readwrite.com/2013/02/11/big-data-and-the-landfills-of-our-digital-lives">unsupervised landfill</a>"), to processing and transforming that data and ultimately to analyzing that data. The fact that most enterprises have yet to get to analytics in any meaningful way is simply a description of where we are in the Hadoop market's evolution.</p>
<div style="text-align: center;"><iframe style="border: 1px solid #CCC; border-width: 1px 1px 0; margin-bottom: 5px;" src="http://www.slideshare.net/slideshow/embed_code/17825514" frameborder="0" marginwidth="0" marginheight="0" scrolling="no" width="427" height="356"> </iframe></div>
<div style="margin-bottom: 5px; text-align: center;"><strong> <a title="What is the Point of Hadoop" href="http://www.slideshare.net/Hadoop_Summit/what-is-the-point-of-hadoop" target="_blank">What is the Point of Hadoop</a> </strong> from <strong><a href="http://www.slideshare.net/Hadoop_Summit" target="_blank">Hadoop_Summit</a></strong></div>
<p>Indeed, Aslett notes that "attempting to fast forward to analytics, missing out on the processing/integration stage, creates silos and will result in disillusionment."&nbsp;</p>
<p>We're still early in Hadoop's technological and market evolution, in part due to the complexity of the technology, with <a href="http://www.cioinsight.com/it-news-trends/slideshows/hadoop-adoption-proves-slow-but-steady-05/">26% of even the most sophisticated Hadoop users</a> citing how long it takes to get into production as a gating factor to its widespread use. Gartner reveals even lower rates of adoption of Big Data projects, often involving Hadoop, at a mere 6%, as enterprises try to grapple with both appropriate use cases and understanding the relevant technology.</p>
<h3>Start With What You Know</h3>
<p>Small wonder, then, that enterprises are starting with known use cases like storage or ETL before proceeding to more ambitious analytics projects, as <a href="https://twitter.com/ckotsakis/status/332529969580351489">Christos Kotsakis suggests</a>. We're still getting comfortable with Hadoop. Applying an unfamiliar technology to a familiar problem makes a lot of sense.</p>
<p>Some day, we'll get to the point where mainstream adopters commonly use Hadoop for significant analytics. But we're not there. Not yet. Just give it time.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em>.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think</link>
                <guid>http://readwrite.com/2013/05/10/hadoop-adoption-accelerates-but-not-for-what-you-might-think</guid>
                <category>Hadoop</category>
                <pubDate>Fri, 10 May 2013 04:30:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[The Rising Costs Of Misunderstanding Big Data]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_104929805.jpg" />
                                        <p>The Big Data boom has largely been fueled by a simple calculation: Data + Technology = Actionable Insights, Magic Ponies, and Superpowers. The reality, of course, is far more pedestrian, because while Big Data technology has indeed increased our ability to store and process lots of disparate data in real-time, the technology is only as useful the people managing it. As Bill Wise, CEO of Mediaocean, <a href="http://allthingsd.com/20130423/big-datas-usability-problem/">highlights</a>, the costs of getting it wrong increase as our reliance on data grows.</p>
<p>To be clear, we've long been able to query so-called "Big Data." We've had expensive data warehousing and Business Intelligence tools for many years. The great innovation of tools like Hadoop is that they've made such capabilities available as free, open-source tools that run on commodity hardware, essentially paving the way for anyone and everyone to become a data scientist.</p>
<p>Therein lies the problem.</p>
<p>Taking an influential paper on economics and intelligence efforts around the Boston bombing suspects as background, wherein a few missing rows in Excel and a misspelling of Boston Marathon bombing suspect Tamerlan Tsarnaev's name, Wise points out that "data management tools (i.e., the FBI’s systems and Excel) were undone by fairly simple errors," with terrible results. In other words, as much as we may believe Big Data is as simple as "Input data into Hadoop, out come insights!", the reality depends heavily on the people querying that data.</p>
<p>And the bigger the data, the bigger the likelihood we'll read it wrong, as Wise posits:</p>
<blockquote>
<p>[M]ore human/data interaction means a lot more room for error (and inefficiency) around increasingly critical data sets - which... can have very serious results... If Big Data can’t fit hand-in-glove with usability and workflow, a lot of the promise of big data will be empty data crunching. That’s not just a problem for getting where we want to be in the evolution of computing. It’s a situation that can lead to bad data management - which translates into bad economics and, sometimes, far worse.</p>
</blockquote>
<p>This confirms renowned statistician <a href="http://readwrite.com/2013/03/29/nate-silver-gets-real-about-big-data">Nate Silver's arguments</a> that data doesn't speak for itself, but is instead corrupted by our biases. Worse, the bigger the data set, the more noise to sift through: "the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine - but a relatively constant amount of objective truth."</p>
<p>Often, misunderstanding our data simply means our businesses will run more inefficiently or, at least, no more efficiently than before. But if Wise is correct, getting our data wrong can have disastrous consequences.</p>
<p>Which&nbsp;means, as <a href="http://data-informed.com/the-mythical-data-scientist-shortage/">I've argued before</a>, that we really need to look inside our organizations for "data scientists," because context is critical to effectively querying our data, as well as knowing which data to collect in the first place. It also means, as <a href="http://blogs.hbr.org/cs/2013/04/the_hidden_biases_in_big_data.html">Kate Crawford argues</a> in <em>Harvard Business Review,</em>&nbsp;"data scientists should take a page from social scientists, who have a long history of asking where the data they're working with comes from, what methods were used to gather and analyze it, and what cognitive biases they might bring to its interpretation."&nbsp;</p>
<p><span style="line-height: 1.538em;">In other words, the more data has the potential to impact our organizations, the more humble and circumspect we should become in using it. The consequences of reading our data wrong scale with the volume and velocity of that data.</span></p>
<p><em><span style="line-height: 1.538em;">Image courtesy of</span><span style="line-height: 1.538em;"><a href="http://www.shutterstock.com"> Shutterstock</a>.</span></em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/04/29/the-rising-costs-of-misunderstanding-big-data</link>
                <guid>http://readwrite.com/2013/04/29/the-rising-costs-of-misunderstanding-big-data</guid>
                <category>Big data</category>
                <pubDate>Mon, 29 Apr 2013 04:00:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[Tax Time Tip: 3 Ways The IRS Is Tracking You Online]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/twitter-tax-cheat.jpg" />
                                        <p>If you haven't filed your taxes yet, you might want to triple-check your math before you do. That's because the <a href="http://money.usnews.com/money/personal-finance/mutual-funds/articles/2013/04/04/irs-high-tech-tools-track-your-digital-footprints" target="_blank">IRS employs a more watchful eye than ever</a>, thanks to Big Data analysis and digital information-gathering tactics.&nbsp;</p>
<p>With the ongoing budget crisis, pressure for the IRS to recover lost revenue has never been higher. Conveniently enough, the agency has made massive investments in its computing power and tools for crunching big data, allowing for more automation and rapid analysis. That means a greater capacity for robo-audits and less room for honest mistakes.</p>
<p>It's not just the tools that have improved. The data itself is richer and more varied than ever, drawing increasingly from whatever details about our digital lives the IRS can get its hands on, including information that isn't publicly accessible. We don't know the full extent of the IRS's data-mining capabilities, but recent reporting has revealed new details.&nbsp;</p>
<h2>1. Analyzing Your Social Media Updates&nbsp;</h2>
<p>The social Web has been a boon for IRS investigators, who can use updates from Facebook, Twitter and other services to bolster its cases against alleged tax cheats. Information about work history, one's physical whereabouts and even purchases can be gleaned from social networks. Some of it, like tweets and certain details from Facebook, are public. But should the IRS want to take a closer look, it supposedly has the means to do so, with or without a warrant.&nbsp;</p>
<p>According to recent reports, the IRS cross-references data from social networks with Social Security numbers and then works in a host of other private data to look for suspicious patterns.&nbsp;</p>
<h2>2. Monitoring Digital Payments and Credit Card Activity&nbsp;</h2>
<p>The rise of commerce and digital payments have also given the IRS new sets of data to mine and analyze. The agency has long looked at taxpayers' activity on ecommerce sites like Ebay, but are now going deeper and getting a look at credit card transactions and other online payments.&nbsp;</p>
<p>The agency looks for potential auditing targets "by matching tax filings to social media or electronic payments," <a href="http://money.msn.com/credit-rating/irs-tracks-your-digital-footprint" target="_blank">according to MSN Money</a>. The exact mechanism of this monitoring isn't known, but MSN Money indicates that it includes examining credit card transactions "for the first time ever."&nbsp;</p>
<p>It's not clear how detailed or widespread this monitoring is, and the IRS isn't likely to spill the beans (lest they tip off tax cheats), but suffice it to say that if the agency feels it has cause to take a peek at your online payment data, it won't have a problem doing so.&nbsp;</p>
<h2>3. Peeking At Your Email Usage&nbsp;</h2>
<p>Exactly when and how the IRS looks at email usage isn't entirely clear. The MSN Money report says the agency's big data analysis tools are used in part for "tracking individual Internet addresses and emailing patterns." That's pretty vague. In theory, the IRS could glean some details about email usage simply by looking at browsing activity, whether that insight comes from an ISP or email service provider.&nbsp;</p>
<p>Does that mean that the IRS has blanket access to everybody's Gmail account for the purpose of feeding its data-crunching behemoth? That seems pretty unlikely. Instead, what it likely does is request access to individual accounts for people who are already suspected of wrongdoing. The American Civil Liberties Union <a href="http://www.aclu.org/blog/technology-and-liberty-national-security/new-documents-suggest-irs-reads-emails-without-warrant" target="_blank">recently uncovered documents</a> that suggest the IRS doesn't feel a warrant is necessary to get such access. Good to know!</p>
                    ]]></description>
                <link>http://readwrite.com/2013/04/12/3-ways-the-irs-is-tracking-you-online</link>
                <guid>http://readwrite.com/2013/04/12/3-ways-the-irs-is-tracking-you-online</guid>
                <category>Big data</category>
                <pubDate>Fri, 12 Apr 2013 04:00:00 -0700</pubDate>
                <author>John Paul Titlow</author>
            </item>
                    <item>
                <title><![CDATA[Show Us The Data: Time For Companies To Reveal What They Know About Us]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/digital-privacy-eye-800_0_0.jpg" />
                                        <p>California has proposed a <a href="https://www.eff.org/deeplinks/2013/04/new-california-right-know-act-would-let-consumers-find-out-who-has-their-personal" target="_blank">potentially groundbreaking consumer privacy law</a>. The Right To Know Act, if approved, would require companies to divulge what kind of data they have on individual consumers, as well as with whom they're sharing that information.&nbsp;</p>
<p>We need this. Not only should California pass this law, but it should be emulated far and wide. And while it's a good start, The Right To Know Act is really just the beginning of what's needed.</p>
<p>The vast quantity of personal data that companies collect, store and sell is mind-boggling. We caught a glimpse of some of this massive and now-routine data mining during the presidential campaign. Outside of the election cycle, it continues full force as marketers and financial institutions amass private information about consumers, sell it to one another and use it in ways that aren't entirely clear. Much of it is totally obvious and innocent. Some of it probably isn't. We don't know. That's the problem.</p>
<h2>The Ongoing Personal Data Explosion&nbsp;</h2>
<p>Of course, this data is just going to keep exploding. The proliferation of smartphones has generated enough privacy questions to keep lawyers and legislators busy for a generation. We're just beginning to grapple with those issues and now <a href="http://readwrite.com/2013/03/11/google-glass-privacy-creepiness" target="_blank">Google wants us to wear computers on our faces</a>. As we move toward wearable computers, connected cars <a href="http://readwrite.com/2013/03/18/smart-homes-our-next-digital-privacy-nightmare">and smart homes</a>, the sheer volume of data about our personal lives is going to grow exponentially.&nbsp;</p>
<p>There's a lot we stand to gain from these advances in personal technology, just as we have with smartphones and tablets. But before we plough forward into this otherwise awesome future, we should probably take a minute and think about some of the less exciting implications. Privacy is at the top of the list.</p>
<p>The Right To Know Act sounds like a sensible attempt to set up the kind of consumer privacy framework we'll need to have in place if we don't want things to get too weird in the future.</p>
<p>Whether or not we actually regulate the ways companies use this data is another question, which we'll also need to deal with. In the meantime, what the Right To Know Act will do is simply allow consumers to know exactly what data exists and and to learn a little bit about how it's being used.</p>
<h2>"This Law Is About Transparency"</h2>
<p>"This law is about transparency and access, not new restrictions on data sharing," writes the <a href="https://www.eff.org/deeplinks/2013/04/new-california-right-know-act-would-let-consumers-find-out-who-has-their-personal" target="_blank">Electronic Frontier Foundation</a> (EFF), one of the supporters of the bill.&nbsp;"It helps consumers, regulators, policymakers, and the world at large shine a light onto the largely hidden, highly lucrative world of the personal data economy."</p>
<p>To Europeans, this concept isn't anything radical. As <a href="http://arstechnica.com/tech-policy/2013/04/california-lawmaker-introduces-unprecedented-personal-data-disclosure-bill/" target="_blank">Ars Technica points out</a>, the European Union has laws like this on the books already, as it should. The principle of habeas data, as it's known, is just a part of digital life there.&nbsp;</p>
<p>How likely is passage of the bill? Plenty of firms will loathe it, but it will be interesting to see how tolerant the more privacy-friendly tech companies are of the idea.&nbsp;It's hard to predict the bill's fate,&nbsp;but when it comes to implementing forward-thinking privacy laws, California has a pretty decent track record.</p>
<p>The premise is that simple: Companies know a lot about us, and we, as consumers, have a right to know what they know. Whether or not we can do anything about it, we at least deserve to know. They are, after all, <em>our</em> lives.&nbsp;</p>
                    ]]></description>
                <link>http://readwrite.com/2013/04/04/private-data-collection-companies-privacy-law</link>
                <guid>http://readwrite.com/2013/04/04/private-data-collection-companies-privacy-law</guid>
                <category>Privacy</category>
                <pubDate>Thu, 04 Apr 2013 05:00:00 -0700</pubDate>
                <author>John Paul Titlow</author>
            </item>
                    <item>
                <title><![CDATA[Nate Silver Gets Real About Big Data]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/Signal%20and%20Noise.jpg" />
                                        <p>While it has become&nbsp;<em>de rigueur</em> to ascribe all sorts of supernatural powers to Big Data, one of the world's most celebrated statisticians, Nate Silver, is far more circumspect about it. If anything, <a href="http://www.amazon.com/dp/159420411X">according to Silver in his book <em>The Signal and the Noise</em></a>, Big Data carries the potential to cloud our decisions by introducing far more noise than it does signal. It's an interesting position for someone who makes a living predicting the future, and one that directly counters other expert opinion.</p>
<p>Take, for example, the new book from data experts Viktor&nbsp;Mayer-Schonberger (University of Oxford) and Kenneth Cukier (<em>The Economist),&nbsp;</em><em>Big Data: A Revolution That Will Transform How We Live, Work and Think</em>. Mayer-Schonberger and Cukier urge us to trust data, not worrying about trying to understand correlations but simply to accept it. As Cukier tells <em>Wired</em>, "Big Data enables us not to test [a] hypothesis, but to let the data speak and tell us what hypothesis is best. And in that way it completely reshapes what we call the scientific method or...how we understand and make sense of the world."</p>
<p>One big problem with this view is that it assumes we have any clue how to query the data to even come up with a "what," much less a "why." It's not as if data simply presents itself to us, and we read it objectively.</p>
<p>Quoting Silver at length:</p>
<blockquote>
<p>"[Big Data] is sometimes seen as a cure-all, as computers were in the 1970s. Chris Anderson…wrote in 2008 that the sheer volume of data would obviate the need for theory, and even the scientific method….</p>
<p>"[T]hese views are badly mistaken. The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning….[W]e may construe them in self-serving ways that are detached from their objective reality.</p>
<p>"Data-driven predictions can succeed--and they can fail. It is when we deny our role in the process that the odds of failure rise. Before we demand more of our data, we need to demand more of ourselves….Unless we work actively to become aware of the biases we introduce, the returns to additional information may be minimal--or diminishing."</p>
</blockquote>
<p>So, for example, more data has not resulted in less political divide, as Silver points out. It has only hardened positions on either side of the aisle. The same holds true for global warming science. The more data we have, the less we seem to agree.</p>
<p>Why? Because data is never neutral. Or, rather, our perception of it is not neutral.</p>
<p>This is as true for individual enterprises grappling with product or personnel decisions as it is for countries debating policy issues. Big Data can contribute to the solving these issues...even as it contributes to making them more difficult. Again quoting Silver:</p>
<blockquote>
<p>If the quantity of information is increasing by 2.5 quintillion bytes per day, the amount of <em style="line-height: 1.538em;">useful</em> information almost certainly isn't. Most of it is just noise, and the noise is increasing faster than the signal. There are so many hypotheses to test, so many data sets to mine--but a relatively constant amount of objective truth.</p>
</blockquote>
<p>This jibes with&nbsp;Gartner's&nbsp;<a style="line-height: 1.538em;" href="http://readwrite.com/2013/01/24/big-data-overhyped-and-overpaid">Svetlana Sicular, who suggests</a>&nbsp;that "Formulating a right question is always hard, but with big data, it is an order of magnitude harder," due in part to the difficulty of figuring out meaningful correlations in our data.&nbsp;</p>
<p>Again, while it may seem convenient to wish for the "data to speak for itself," it simply doesn't. It can't. It is always mediated by imperfect individuals with all of our biases, strengths and self-interest.</p>
<p>Which is not to say that data can't help us with our answers. Silver certainly turns to data to help him forecast elections, baseball games and Oscar winners. The trick, as he argues, is to take a Bayesian approach to data analytics, getting comfortable with probabilities, working hard to recognize and account for our biases, and not trying to predict certainties. When we predict certainties, we are almost always wrong.</p>
<p>In short, Big Data has tended to come with its share of Big Hype. So long as we're realistic about its potential, and recognize that our data is only as useful as the human intelligence we bring to it, minus the human biases with which we burden it, Big Data should, indeed, pay significant dividends.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/29/nate-silver-gets-real-about-big-data</link>
                <guid>http://readwrite.com/2013/03/29/nate-silver-gets-real-about-big-data</guid>
                <category>Nate Silver</category>
                <pubDate>Fri, 29 Mar 2013 07:44:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[What The Donglegate Fixation Made You Miss At PyCon]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_python_0.jpg" />
                                        <p>Sadly, the only news coming out of <a href="https://us.pycon.org/2013/">PyCon</a> last week was <a href="http://www.independent.co.uk/news/world/americas/woman-is-sacked-for-tweeting-picture-of-men-who-made-sexist-dongle-jokes-at-pycon-developer-conference-8546265.html">Donglegate</a>. In a sad case of he-said-she-blogged, two people lost their jobs and the technology industry took a deep look inside and discovered that (surprise!) it's a male-dominated, sometimes misogynistic bro-fest. Lost in all the finger pointing of the past week, however, was a central fact:</p>
<p>Python is doing exceptionally well.</p>
<p>This is evident from PyCon attendance alone, which saw 2,500 people attend (<a href="http://wiki.python.org/moin/PyCon/Attendance">up from a mere 20 in 1994</a>), some of them the beneficiaries of the <a href="http://pycon.blogspot.com/2013/03/bringing-first-timers-to-pycon-through.html">$100,000 in sponsorship money</a> raised. Importantly, 20% of the attendees were women (which makes the inappropriate "dongle" comments even more frustrating). Another $10,000 was raised in an auction for PyLadies.</p>
<p>Python's popularity is particularly interesting in light of programming language fragmentation. Whereas once enterprise IT had to choose sides between Java or .Net, today's empowered enterprise developers are spoiled for choice,&nbsp;as Redmonk analyst <a style="line-height: 1.538em;" href="http://redmonk.com/sogrady/2013/02/28/language-rankings-1-13/">Stephen O'Grady opines</a>. Despite a myriad of programming languages from which to choose,&nbsp;Python continues to more than hold its own, ranking <a style="line-height: 1.538em;" href="http://sogrady-media.redmonk.com/sogrady/files/2013/02/lang-rank-Q113-big.png">fourth overall in terms of adoption</a>.</p>
<p>And when it comes to relative job growth, Python is rocking, even compared to the other top languages:</p>
<div style="width: 540px;"><a title="python,java,php,Javascript Job Trends" href="http://www.indeed.com/jobtrends?q=python%2Cjava%2Cphp%2CJavascript&amp;relative=1&amp;relative=1"> <img src="http://www.indeed.com/trendgraph/jobgraph.png?q=python%2Cjava%2Cphp%2CJavascript&amp;relative=1" alt="python,java,php,Javascript Job Trends graph" width="540" height="300" border="0" /> </a></div>
<p>Nor is Python faring much worse in <a href="http://www.indeed.com/jobanalytics/jobtrends?q=python%2Cjava%2Cphp%2CJavascript&amp;l=">absolute job numbers</a>, as its growth has it gaining on Java and JavaScript.</p>
<h3>Big DataTailwind For Python?</h3>
<p>Interest in languages like Java and PHP has <a href="http://www.google.com/trends/explore#cat=0-5&amp;q=python%2C%20php%2C%20java%2C%20&amp;date=1%2F2010%2038m&amp;cmpt=q">fallen over the last few years</a> while Python has remained steady. But there's reason to believe Python is about to see an upsurge in interest.</p>
<p>Part of Python's popularity stems from how easy it is to learn, especially for enterprise developers coming from a C/C++ or Java background. Developers also turn to it because of its general purpose nature. Developers and the enterprises that employ them often turn to technologies that fit multiple use cases, particularly when they're easy for newbies to pick up.</p>
<p>But Big Data may well be Python's big selling point.</p>
<p>While <a href="http://w3techs.com/technologies/details/pl-python/all/all">PHP still dominates the web</a>, Python is <a href="http://www.theregister.co.uk/2012/06/18/scripting_languages_in_the_enterprise/">making headway</a> within the traditional enterprise, its libraries for data manipulation and analysis make it a great fit for the Big Data boom.&nbsp;</p>
<p>As&nbsp;AppNexus Director of Optimization and Analytics <a href="http://www.computerworld.com/s/article/9232917/Python_Big_Data_s_secret_power_tool">David Himrod tells&nbsp;<em>Computerworld</em>:</a></p>
<blockquote>
<p>Key to Python's usefulness is its simplicity...One of the biggest challenges that [AppNexus] faces is how to get a diverse set of employees working on the same technology stack. Python provides employees with different backgrounds--notably engineers, mathematicians and analysts--a common, easy-to-understand language that can be used to prototype new functionality for the company.</p>
</blockquote>
<p>As anyone that has been around open source over the years knows, "prototype" today turns to "production" tomorrow. "Complex but powerful" tends to be a losing formula for new technologies that depend upon developer adoption. "Powerful but easy to use," however, wins most every time, particularly in an area like Big Data, which has enterprises experimenting with their data.&nbsp;</p>
<p>Python's Big Data ambitions also recently got a <a href="http://www.informationweek.com/government/information-management/darpa-funds-python-big-data-effort/240147993">cash infusion from DARPA</a> (U.S. Defense Advanced Research Projects Agency), which invested $3 million in&nbsp;Continuum Analytics to help improve Python's data processing and visualization capabilities.</p>
<h3>A Bright, Big Data Future For Python</h3>
<p>None of which means Python has won the language war. As O'Grady notes, fragmentation is here to stay, given that developers increasingly determine the tools they use, and opt for a languages tailored to specific applications. But given Python's ready-made fit for Big Data, Big Data's importance, and investments in Python to make it even better for Big Data projects, it seems safe to project a healthy future for Python.&nbsp;</p>
<p>With or without juvenile and demeaning dongle comments.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em>.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/28/what-the-donglegate-fixation-made-you-miss-at-pycon</link>
                <guid>http://readwrite.com/2013/03/28/what-the-donglegate-fixation-made-you-miss-at-pycon</guid>
                <category>Python</category>
                <pubDate>Thu, 28 Mar 2013 07:13:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[IBM Wants Your CEO To Embrace The Future — And It Will Do All The Hard Work]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_96379706.jpg" />
                                        <p class="p1">Executives with a capital "C" in their title generally don't have a clue on how to adjust to the social, mobile, and cloud-based business world. Now IBM wants to help these Luddite execs adapt -- by plunking them down into a new lab where it will show them what the IT tools at their disposal can actually accomplish.</p>
<p class="p1">IBM's new&nbsp;Customer Experience Lab aims to deploy&nbsp;100 hand-picked specialists — industry titans from the world of machine learning, analytics, and a slew of other "Big Data" fields — as a consulting consortium aimed at aiding C-suite executives.</p>
<p class="p1">"Take for example a CMO whose life has changed quite a bit as a result of social and mobile in the last couple years," explains Clay Williams, senior manager of "front-office innovation" at IBM. "The goal here is to bring the headlights that IBM research has to shine and show them further down the road… so that IBM can help them chart the path forward."</p>
<p class="p1">In other words, IBM wants to provide some tools, and some advice, to help top executives learn how not to screw things up in a landscape they have no hope of understanding on their own.&nbsp;After all, a CFO may be well-versed in the complexities of financial-based business models, but isn't going to have the slightest idea of how to employ data mining and machine learning to better understand a single customers needs.</p>
<p class="p1">IBM knows that, and it insists it isn't looking to educate these individuals.&nbsp;"It's not so much teaching, but delivering these new technologies," Williams says. "It's built around us working in partnership."</p>
<h2 class="p1">"Big Data" Tools</h2>
<p class="p1">With more than $6 billion going into research and development each year, IBM is one of the very few global companies capable of offering up an expensive, need-based Big Data consulting service.</p>
<p class="p1">But what exactly will it look like? Well, the new IBM lab aims to give business leaders the opportunity to work alongside those 100 experts in order to jointly create new business strategies based on what the companies' own data tell them.</p>
<p class="p1">For interested companies, IBM will pinpoint a C-level candidate exec and help craft new business strategies for him or her. Or, if you prefer the original IBMspeak: "From nomination to partnering to understanding the client problem and finding the right research team, then what we do is look at proof of concept model," Williams says. "We see if they are appealing, rapidly prototype some of those ideas and then go to a full solution process."</p>
<p class="p1">IBM outlines three generic sorts of breakthroughs it has identified for potential clients to leverage:</p>
<ul>
<li><strong>Customer insight:</strong> Applying advanced capabilities such as machine learning and visual analytics to predict differences in individual customer behavior across multiple channels.&nbsp;</li>
<li><strong>Customer engagement:</strong> Using deep customer engagement to drive insight and continuously deliver value by personalizing engagement, versus transactional experiences.</li>
<li><strong>Employee engagement:</strong> Embedding semantic, collaborative, and multimedia technologies to foster employee engagement and insight – in person and online.</li>
</ul>
<h2 class="p1">Mobile Banking? Great, But What's Next?</h2>
<p class="p1">Williams offers an example from banking, both because that's an area where Big Data technologies can be particularly helpful, and because it's the sector in which two of the IBM lab's first clients&nbsp;—&nbsp;British mutual institution Nationwide Building Society and Mexican superbank Banorte&nbsp;—&nbsp;happen to operate.</p>
<p class="p1">"If you think about phase 1 of enabling [mobile] devices for banking, it was largely about parity with web-based banking," he says&nbsp;—&nbsp;for instance, offering the ability to transfer funds or pay bills via mobile apps. Williams says IBM aims to go beyond that. If a bank wants to develop a plan for individual, social network-based experiences down the line, IBM can bring in a machine learning expert to parse how customers are interacting with businesses across various channels and develop algorithms for predicting behavior.&nbsp;"Now they're are asking the deeper question: 'What's going to happen next?'"</p>
<p class="p1">And that's the question IBM's customer-experience lab wants its would-be clients to be asking. It's supposedly the key with which international-scale business can craft strategies that fit the needs of individual customers -- though exactly how you get there from here still remains a bit fuzzy. One thing is for sure: IBM is convinced that it depends on crunching mounds of data. Oh, and on paying those consulting fees.</p>
<em>Image courtesy of&nbsp;<a href="http://www.shutterstock.com/gallery-264046p1.html?cr=00&amp;pl=edit-00">Tomasz Bidermann</a> / <a href="http://www.shutterstock.com/?cr=00&amp;pl=edit-00">Shutterstock.com</a>.</em>
                    ]]></description>
                <link>http://readwrite.com/2013/03/14/ibm-wants-your-ceo-to-embrace-the-future-dont-worry-it-will-do-all-the-hard-work</link>
                <guid>http://readwrite.com/2013/03/14/ibm-wants-your-ceo-to-embrace-the-future-dont-worry-it-will-do-all-the-hard-work</guid>
                <category>IBM</category>
                <pubDate>Thu, 14 Mar 2013 03:00:00 -0700</pubDate>
                <author>Nick Statt</author>
            </item>
                    <item>
                <title><![CDATA[Attacking Big Data Old-School Style - With VMware's SQLFire]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_6397828.jpg" />
                                        <p class="p1"><a href="http://www.vmware.com/" target="_blank"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/vmware_300x60_contributed.jpg" style="" />
			</span>
</a></p>
<p class="p1">As more and more information floods into the Internet, organizing and making sense of this <a href="http://readwrite.com/2012/03/05/big-data">Big Data</a> becomes more important and more difficult.</p>
<p class="p1">New database methods are emerging to help process unstructured data, but IT developers and database deployers also have to figure out how to deal with the world of legacy technology.</p>
<p class="p1">For the last 40 years, relational database programs (usually powered by <a href="http://en.wikipedia.org/wiki/SQL">SQL</a>-based management systems) have been the backbone of supplying businesses with organized rows and columns of data. The problem is that these legacy systems may not be able to work together to give businesses the information they need when they need it. Older programs may also have trouble processing data requests over long distances.</p>
<h2 class="p2">A New Way Of Thinking About Databases</h2>
<p class="p1">A new way of thinking is needed. Over the past decade the push for “not only SQL” or <a href="http://en.wikipedia.org/wiki/NoSQL">NoSQL</a> database software has provided a pathway for businesses to connect bits and pieces of data from a variety of sources at very rapid speeds across different geographies.</p>
<p class="p3"><strong>(See also </strong><strong><a href="http://readwrite.com/2013/02/20/whats-next-for-taming-big-data">What's Next For Taming Big Data?</a><span style="line-height: 1.538em;">)</span></strong></p>
<p class="p1">Some businesses are spreading out the workloads using noSQL databases within cloud computing-based networks. Others are approaching the problem still using traditional SQL relational database software - and that’s perfectly OK.</p>
<p class="p1">Previous articles in this series (<a href="http://readwrite.com/series/taming-big-data/">Taming Big Data</a>) discuss the benefits of a noSQL database tool like <a href="http://vmware.com/go/gemfire">VMware’s vFabric GemFire</a>. But SQL database software retains a well-established community of tens of thousands of developers and integrators who may be reluctant to move beyond the SQL they know and love. What’s a company to do?</p>
<p class="p3"><strong>(See also </strong><strong><a href="http://readwrite.com/2013/02/27/cloud-based-gemfire-makes-it-easier-to-work-with-big-data">VMware's Cloud-Based GemFire Makes It Easier To Work With Big Data</a>.)</strong><strong><br /></strong></p>
<p class="p1">For SQL diehards, <a href="http://www.vmware.com/go/sqlfcomm">VMware’s vFabric SQLFire</a> SQLFire is a distributed SQL database typically used for online transactions. The software is more modern than most traditional relational database management systems.</p>
<h2 class="p2">What Can SQLFire Do For Me?</h2>
<p class="p1">SQLFire functions and performs much like <a href="http://readwrite.com/2013/02/27/cloud-based-gemfire-makes-it-easier-to-work-with-big-data">GemFire</a> under the hood. SQLFire uses GemFire's data grid engine, which lets both programs capture data and then replicate and partition the information "in-memory” on the server. But instead of having to learn GemFire commands and controls, SQLFire has a user interface and programming framework that will be familiar to developers used to programming in a SQL interface and with SQL tools.</p>
<p class="p1">Backups are enabled through virtual copies on other connected servers, although data can be stored long-term on disks as needed.</p>
<p class="p1">Unlike other embedded databases, SQLFire allows several servers to store replicated and partitioned tables, persist data to disk, communicate directly with other servers and participate in distributed queries.</p>
<p class="p1">For traditional IT developers and database deployers, the SQLFire interface makes it easier to write applications and take advantage of GemFire’s underlying noSQL technology. Developers and integrators who know SQL well will have an easy time adapting SQLFire to new projects.</p>
<p class="p1">SQLFire is perfect for classic Web transactions, especially where there is a need for fast speeds and a requirement to dig deep into clusters of data.</p>
<h2 class="p2">Business Case For SQLFire</h2>
<p class="p1">In addition to making SQL developers feel comfortable, SQLFire can work across multiple networks and geographies. This comes in handy when enterprises need information at the moment it becomes available on multiple continents.</p>
<p class="p1">For example, a large regional bank in the Northeastern United States collects large amounts of data that helps it maintain its regional and branch offices. The bank also monitors customer transactions at tellers and various ATMs.</p>
<p class="p1">Bank management was interested in measuring the different types of transactions being handled at each of type of station, what types of accounts they were accessing and the various times of day the transactions took place.</p>
<p class="p1">Historically, the bank could attach an individual database to each branch, but in today's global environment the company decided it needed to measure all of these data points at the same time for each office. The company tested vFabric SQLFire against its own systems and found the existing server took 20 minutes to complete the queries while the SQLFire server completed its task in less than a minute.</p>
<h2 class="p2">Deploying SQLFire In The Enterprise</h2>
<p class="p1">In the enterprise sQLFire is generally found on inexpensive computer servers in database clusters. A typical use case would find SQLFire helping eliminate potential data bottlenecks in new mobile and Web environments. Another common deployment option for SQLFire is to integrate it with existing traditional databases or analytics programs.</p>
<p class="p1">The software can also be interfaced through an API using programming languages such as Java or <a href="http://www.springsource.org/">Spring</a>. SQLFire is also compatible with Java database (JDBC) or ADO.NET.</p>
<p class="p1">As companies look for new ways to make data accessible and provide a consistent view of that information, it's important to have tools that suit the needs of all kinds of developers and IT managers.</p>
<p class="p1">VMware’s <a href="http://www.vmware.com/products/application-platform/vfabric-gemfire/overview.html">GemFire</a> and <a href="http://www.vmware.com/products/application-platform/vfabric-sqlfire/overview.html">SQLFire</a> software are designed to address just those needs - allowing companies to move beyond concerns over speed and scale and tackle Big Data applications head on.</p>
<p class="p1"><a href="http://www.vmware.com/" target="_blank"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/vmware_300x60_contributed.jpg" style="" />
			</span>
</a></p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/12/attacking-big-data-old-school-style-with-vmware-sqlfire</link>
                <guid>http://readwrite.com/2013/03/12/attacking-big-data-old-school-style-with-vmware-sqlfire</guid>
                <category>Taming Big Data</category>
                <pubDate>Tue, 12 Mar 2013 10:33:00 -0700</pubDate>
                <author></author>
            </item>
                    <item>
                <title><![CDATA[Proprietary Hadoop Is A Losing Strategy]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/Hadoop%20elephant.jpg" />
                                        <p>Hadoop, nearly synonymous with Big Data, has many failings. But open source is not one of them. In fact, Hadoop's open-source license remains one of its biggest draws, giving enterprises plenty of reasons to persevere in using it despite its shortcomings. It's therefore hard to see how EMC's new <a href="http://www.emc.com/about/news/press/2013/20130225-04.htm">Pivotal HD</a>, essentially a proprietary distribution of Hadoop, can hope to succeed.</p>
<p>Not that everyone agrees with this statement.</p>
<p>Dan Woods,&nbsp;CTO and editor of CITO Research and a contributor to&nbsp;<em>Forbes</em>, argues that embedding Hadoop into EMC Greenplum's massively parallel processing (MPP) database (HAWQ) offers CIOs and CTOs the simplicity they need to be successful with Hadoop. He has a point: Hadoop&nbsp;<em>is</em> complex and somewhat hard to use, which is why Cloudera CEO Mike Olson has <a href="http://www.theregister.co.uk/2012/06/14/hadoop_still_too_complex_for_enterprise_customers/">argued</a> that most of the world will experience the power of Hadoop through applications, nearly all of which will be proprietary, I might add.</p>
<p>But Olson's argument differs from Woods' argument in at at least one major way: Pivotal HD is enterprise infrastructure, not an application, and enterprise infrastructure is increasingly open source.</p>
<p>There are plenty of reasons for this, but RethinkDB's <a href="http://nosql.mypopescu.com/post/42017797886/how-to-plan-for-big-data-waterfall-vs-agile">Alex Popescu nails</a> one critical factor:</p>
<blockquote>
<p>Hadoop is so successful despite its complexity [because i]t allows experimenting and trying out new ideas, while continuing to accumulate and storing your data. It removes the pressure from the developers. That’s agility. It’s highly appreciated.</p>
</blockquote>
<p>In other words, a big reason for Hadoop's success is its open-source license, which permits a hefty amount of experimentation without having to get an enterprise license from EMC, Oracle, or any of the other incumbent infrastructure vendors. &nbsp;</p>
<p>EMC's Scott Yara tries to deflect criticism of its proprietary foray into Hadoop by declaring "We're all in on Hadoop, period," but as 451 Research analyst <a href="http://blogs.the451group.com/information_management/2013/03/11/all-in-on-hadoop/">Matt Aslett counters</a>, "I have no doubt that EMC Greenplum is 'all in' on Pivotal HD, but that’s not the same thing at all."</p>
<p>Take this away by building a <a href="http://www.cio.com/article/729451/EMC_Greenplum_Tackles_Big_Data_With_Hadoop_Distribution">proprietary Hadoop distribution</a>, and EMC has basically erased the very thing that made Hadoop workloads proliferate in the first place. EMC also cuts itself out of the standard adoption cycle for Hadoop, as Redmonk analyst <a href="http://redmonk.com/sogrady/2013/03/06/pivotal-hd/#ixzz2NFKCsdmZ">Stephen O'Grady suggests</a>, "Certainly there will be customers whose needs will dictate the adoption of a unique solution like Pivotal HD, but how many will that be relative to the segment whose adoption cycle begins with the download of one of the free Hadoop distributions?"</p>
<p>Today, Hadoop is one of the industry's hottest job trends. Even in absolute job numbers, it's about to pass EMC-related job posts:</p>
<div style="width: 540px;"><a title="Hadoop,emc Job Trends" href="http://www.indeed.com/jobtrends?q=Hadoop%2Cemc"> <img src="http://www.indeed.com/trendgraph/jobgraph.png?q=Hadoop%2Cemc" alt="Hadoop,emc Job Trends graph" width="540" height="300" border="0" /> </a>
<table style="font-size: 80%;" width="100%" border="0" cellspacing="0" cellpadding="6">
<tbody>
<tr>
<td><a href="http://www.indeed.com/jobtrends?q=Hadoop%2Cemc">Hadoop,emc Job Trends</a></td>
<td align="right"><a href="http://www.indeed.com/jobs?q=Hadoop">Hadoop jobs</a> - <a href="http://www.indeed.com/jobs?q=EMC">EMC jobs</a></td>
</tr>
</tbody>
</table>
</div>
<p>Enterprises aren't hiring for EMC's brand of Hadoop. They're hiring for the open source Hadoop. This matters.</p>
<p>Perhaps EMC feels that Hadoop's brand is big enough now that enterprises essentially understand it and are ready to move on from experimentation to full-scale adoption. In this EMC is likely to be disappointed. According to recent <a href="http://strataconf.com/strata2013/public/schedule/detail/27767">IBM survey data</a>, only 6% of enterprises have two or more Big Data projects underway (likely, though not explicitly, involving Hadoop in some way), and a mere 22% are running pilots to test the efficacy of their Big Data strategies. Everyone else is in full-on planning mode.</p>
<p>By creating a proprietary Hadoop distribution, EMC just dramatically limited its access to the 94% that are still in Big Data education and trial mode. Yes, it has a gargantuan sales force. No, they're simply not going to be able to reach would-be customers as efficiently as an open-source distribution model does.</p>
<p>But maybe EMC hasn't gone proprietary to more effectively monetize Hadoop interest, and instead sincerely believes, like Woods ("<a href="http://www.forbes.com/sites/danwoods/2013/02/27/why-sql-matters-the-limits-of-open-source-and-other-lessons-of-emc-greenplums-pivotal-hd/">open source development has its limits</a>"), that complex infrastructure problems are a poor match for open source. History has not been kind to such thinking, as Aslett sarcastically implies:</p>
<blockquote class="twitter-tweet">
<p>"Enterprise products" always prevail over open source. <a title="http://onforb.es/VgQ0Cq" href="http://t.co/zvCLZAMbtR">onforb.es/VgQ0Cq</a> That's why Linux has been such an abject failure versus Unix.</p>
— Matt Aslett (@maslett) <a href="https://twitter.com/maslett/status/307422911340355584">March 1, 2013</a></blockquote>
<p>EMC has seemingly bottomless resources to throw at Hadoop, and every incentive to do so. It's a smart, highly successful company and no doubt will prove successful with Pivotal HD. However, I can't see it ever dominating an open-source infrastructure market with a proprietary distribution. <a href="http://readwrite.com/2012/12/31/tech-jobs-in-2013-open-source-open-data">Open source is the foundation for today's most interesting markets</a>, from Big Data to mobile to cloud computing. It's unlikely that EMC will somehow stem this tide with a proprietary product, no matter its short-term performance or functionality advantages.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/12/proprietary-hadoop-is-a-losing-strategy</link>
                <guid>http://readwrite.com/2013/03/12/proprietary-hadoop-is-a-losing-strategy</guid>
                <category>Big data</category>
                <pubDate>Tue, 12 Mar 2013 09:30:00 -0700</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[New eBay Metrics Help Save Millions in Data Center Costs]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/dse%20dash%20%281%29.jpeg" />
                                        <p class="p1">As part of its <a href="http://dse.ebay.com/">Digital Service Efficiency</a> effort to reduce power consumption in its data centers, <a href="http://www.ebay.com/">eBay</a> slightly changed some of its software code so that the code would require less memory. Less memory meant more operations could be performed on the same server over a given amount of time.</p>
<p class="p1">The end result? eBay cut its power consumption by about a megawatt and took 400 servers out of its data centers, saving some $2 million in equipment costs.</p>
<p class="p1">“The ripple effect was gratifying,” said Dean Nelson, eBay's vice president of Global Foundation Services during a break at the annual conference of <a href="http://www.thegreengrid.org/">The Green Grid</a>. “We just changed the application a bit to save power."</p>
<h2 class="p2">What Is Digital Service Efficiency (DSE)?</h2>
<p class="p1">Digital Service Efficiency (DSE) is a metric that the auction giant hopes to popularize in the industry. In a nutshell, DSC divides the work accomplished by the power consumed. In eBay’s case, it divides the number of transactions and/or listings by the energy consumed. Energy used in searches is amortized across the entire operations. It is similar to the PUE (Power Use Effectiveness) rating developed by The Green Grid a few years ago, but it arguably is more targeted toward measuring power consumption and actual operations.</p>
<p class="p1">eBay’s figures don’t include all the crucial energy data — like how much gas gets consumed shipping a 1973-era Mattel Electronic Football game from Salt City, Mo., to a collector in San Jose, Calif. — but the numbers are still compelling. If anything, they underscore how data centers represent a more efficient way to conduct commerce than driving to the mall:</p>
<ul>
<li><a href="http://dse.ebay.com/">eBay conducts 45,914 transactions per kilowatt hour</a>. A medium-sized window-based <a href="http://michaelbluejay.com/electricity/cost.html">air conditioner can consume a kilowatt hour in an hour</a>.</li>
<li>The company generates $337 million per megawatt hour.</li>
<li>eBay has 52,075 servers that serve 112 million active users.</li>
<li>It gets $116,716 in revenue per server.</li>
<li>eBay racked up 7.3 trillion “transactions”, i.e. URL requests to buy or sell something, in 2012. That’s more than 1,000 transactions for every person on the planet.</li>
</ul>
<p class="p1">The numbers come from eBay, so one can take them with a grain of salt, but the overall picture is pretty clear. The company serves a lot of customers fairly efficiently. At a minimum, the company is certainly doing its homework to make it as efficient as possible.</p>
<h2 class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/DSC_4682%20-%20HR.jpg" style="" />
				<span class="embedded-Media-image-caption">The Dell MDC server racks before the cooling unit is attached.</span>
		</span>
</h2>
<h2 class="p1">How Can DSE Help Save Money In The Datacenter?</h2>
<p class="p1">eBay, for instance, has reduced the number of server configurations it will deploy down to two. Previously, it had 200 to 300 server configurations, with 15 of them accounting for 80% of the total population.) eBay now has a High Performance Computing (HPC) server designed to handle transactions. The HPC servers contain 72GB of memory and 4 hard drives. They are tuned, says Nelson, for rapid processing. A single rack can hold 96 of the servers, brining the total RAM per rack to 6,192GB.</p>
<p class="p1">Complementing the HPC servers are eBay’s Big Data servers, of which 48 can fit into a rack. Each Big Data server comes with a dozen 2TB drives. The Big Data servers can fit 1.2 petabytes of storage capacity per rack. The next Big Data servers may come with 3TB or 4TB drives, which would boost the total storage capacity in a rack to between two and three petabytes. Two petabytes can hold the same amount of information contained in all of the <a href="http://highscalability.com/blog/2012/9/11/how-big-is-a-petabyte-exabyte-zettabyte-or-a-yottabyte.html">academic research libraries in the U.S.</a> (Unlike Google or Facebook, however, eBay is not designing its own equipment in datacenters. Instead, it will buy from computer vendors.)</p>
<p class="p1">It has also created a showcase datacenter in Phoenix, dubbed Project Mercury, that utilizes modular containers to isolate equipment and pack it more densely along with liquid cooling. (If you can lower air conditioning bills in Arizona, you can lower them anywhere!)</p>
<p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/IMG_2628_0.JPG" style="" />
				<span class="embedded-Media-image-caption">An HP POD being installed next to another POD</span>
		</span>
</p>
<p class="p1">The challenge now lies in balancing efficiency with necessary growth. Storage capacity, in particular, is set to explode.</p>
<p class="p1">“It is the storage, stupid,” Nelson said. “The volume of growth of storage is insane. We added 100 petabytes of capacity in the last 12 months.”</p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/11/new-ebay-metrics-help-save-millions-in-data-center-costs</link>
                <guid>http://readwrite.com/2013/03/11/new-ebay-metrics-help-save-millions-in-data-center-costs</guid>
                <category>Data Centers</category>
                <pubDate>Mon, 11 Mar 2013 04:04:00 -0700</pubDate>
                <author>Michael Kanellos</author>
            </item>
                    <item>
                <title><![CDATA[12 Things You (Probably) Didn't Know About Online Security]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/ESET-cobb.JPG" />
                                        <p class="p1">At the <a href="http://www.rsaconference.com/">RSA Conference</a> in San Francisco last week, I got the chance to sit down with<a href="http://www.welivesecurity.com/author/scobb/" target="_blank"> Stephen Cobb, a distinguished security researcher for the IT security company ESET</a>. We talked about a lot of things, including Android security issues and how walled gardens have their uses.</p>
<p class="p2"><strong>(See also <a href="http://readwrite.com/2013/03/04/in-the-security-world-android-is-the-new-windows">In The Security World, Android Is The New Windows</a>.)</strong></p>
<p class="p1">It was a great conversation, touching on a wide variety of fascinating aspects of online and mobile security, and I wanted to share as many of them as possible.</p>
<p class="p1">This list seemed like the best way to do that. And while not every one of the dirty-dozen points presented here may surprise you, I can pretty much guarantee that few people will already know - or agree with -&nbsp;<em>everything</em> on the list:</p>
<p class="p1"><strong>1. Big Data is not new to the anti-virus industry.</strong> Turns out the anti-virus companies have been doing traffic analysis, incident sharing and code sharing for decades, Cobb claims. They just didn't call it Big Data until the term become fashionable.</p>
<p class="p1"><strong>2. Anti-virus companies have been practicing co-opetition since the 1980s</strong>, when they realized there was no percentage in one company being able to stop one virus while you needed another company to stop a different virus. They quietly began sharing virus signatures and other information, Cobb says.</p>
<p class="p1"><strong>3. All the major Web browsers share information on malware sites and other threats</strong>. Chrome, Internet Explorer, Firefox and the others all share which URLs to flag, for example. That's why when <a href="http://money.cnn.com/2013/02/22/technology/security/nbc-com-hacked-malware/">NBC.com was hacked recently</a> and started spewing malware, everybody was able to block it almost immediately.</p>
<p class="p1"><strong>4. One of the hardest parts of securing Big Data is knowing <em>where</em> the data is actually stored.</strong> In the old days, when data was collected and stored, it didn't really move much. Now, in the cloud, Cobbs says we don't really know where data is stored. Malware creators are intent on exploiting that, but what form that will take remains to be seen.</p>
<p class="p1"><strong>5. One reason more high-value targets haven't been hacked is that there is still so much low-hanging fruit</strong> for the bad guys to go after. According to Cobb, so far, there hasn't been much need to try and crack the hardest targets.</p>
<p class="p1"><strong>6. Most attacks take the form of malware or hacking.</strong> Of the hacking attacks, Cobb says, 80% go after passwords that are either non-existent, guessed or stolen.</p>
<p class="p1"><strong>7. Anti-virus hasn't been about matching virus signatures for years.</strong> Some people say the anti-virus model doesn't work because so much new malware is coming out all the time that anti-virus solutions can't possibly keep up. But Cobb protests that most anti-virus software is continually detecting previously unseen malware.</p>
<p class="p1"><strong>8. People who know what they're doing on the Internet might be able to get by with no anti-virus software.</strong> But Cobb says people are fooling themselves when they claim: "I don't run anti-virus software and I've never been hacked." "Are you really OK telling everyone you know - your mom, for instance - not to run anti-virus software?" he asks.</p>
<p class="p1"><strong>9. There's still an incredible amount of spam out there.</strong> You don't see it, but it's still there. It's using a a huge amount of datacenter power to block it, but it's built into the network security appliance and you don't have to deal with it.</p>
<p class="p1"><strong>10. The overall trend is for increasing levels of security to be compressed into the core</strong>, to become part of a standard install. That's happened to anti-spam, to firewalls and it's happening to anti-virus, too.</p>
<p class="p1"><strong>11. It's a lot harder to write 64-bit malware than it is to write 32-bit malware. </strong>And that could help lower the number of attacks on 64-bit systems.</p>
<p class="p1"><strong>12. In many ways, hacking behavior seems to have gotten <em>better</em> over the years</strong> - at least in the United States, Cobb says. But we are now increasingly exposed to other, more dangerous places. The globalization of the Net has caught up with us even as the value of hacking has one way up. Today, hackers aren't just messing with us, Cobb notes, they're stealing from us. And that's a big new incentive.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/08/12-things-you-probably-didnt-know-about-online-security</link>
                <guid>http://readwrite.com/2013/03/08/12-things-you-probably-didnt-know-about-online-security</guid>
                <category>Security</category>
                <pubDate>Fri, 08 Mar 2013 05:01:00 -0800</pubDate>
                <author>Fredric Paul</author>
            </item>
                    <item>
                <title><![CDATA[How Big Data Can Boost Weather Forecasting]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_403008.jpg" />
                                        <p class="p1"><em>Guest author Steve Hamm is a strategist, writer and videographer in IBM's corporate communications department.</em></p>
<p class="p1">Last September, when Typhoon Sanba smashed into the Korean peninsula, it packed winds so strong that they sent rocks flying through the air like missiles and caused massive power outages. “Hwangsa” storms, carrying dense clouds of yellow dust from China’s Gobi Desert that are sometimes loaded with heavy metals and carcinogens, sweep across the peninsula from West to East.</p>
<h2 class="p2"><strong>9.3 Petabytes Of Storage For The KMA</strong></h2>
<p class="p1">Menaced by such destructive weather phenomena, South Korea is upgrading its national weather information system with the goal of understanding weather patterns better and predicting better the location and ferocity of weather events. The upgrade being installed by the <a href="http://web.kma.go.kr/eng/index.jsp">Korean Meteorological Administration</a> increases the agency’s data storage capacity by nearly 1,000% to 9.3 petabytes, making it Korea’s most capable storage system.</p>
<p class="p1">The KMA project dramatically illustrates today’s big data phenomenon and its impact on weather forecasting.&nbsp;Thanks to the rapid spread of sensors and satellites, and to the increase in computer number-crunching speeds, it’s possible to forecast weather changes more accurately and with improved detail&nbsp;– potentially saving thousands of lives and safeguarding property.&nbsp;</p>
<p class="p1">Increasing evidence of climate change worldwide is prompting governments and scientists to take action to protect people and property from its effects. But to take effective action, they need to know understand a lot more about the weather – everything from what’s going to happen tomorrow to what’s coming next year. For instance, leaders of the city of Hoboken, N.J., in the United States, which flooded badly last fall during Hurricane Sandy, are considering <a href="http://www.npr.org/2013/02/25/172858141/hoboken-mayor-proposes-universal-solution-to-flooding">building a wall around Hoboken to keep the tidal Hudson River at bay</a>. The problem is, if they don’t build high enough the wall could end up turning the city into a giant bathtub rather than keeping rising waters out.</p>
<h2 class="p2"><strong>Listen To Deep Thunder</strong></h2>
<p class="p1"><span class="s1" data-mce-mark="1"><a href="http://asmarterplanet.com/blog/2012/07/18315.html">IBM Research scientists</a></span> are working to bring the most sophisticated data analytics to bear on weather forecasting. Their long-term weather analysis project, called <a href="http://arstechnica.com/business/2012/03/how-ibms-deep-thunder-delivers-hyper-local-forecasts-3-12-days-out/">Deep Thunder</a>, combines data with sophisticated mathematical algorithms and computing power.</p>
<p class="p1">The scientists established at test bed in the New York City metropolitan area, where they set up a three-dimensional grid of thousands of blocks. That makes it possible for them to run calculations that produce very precise weather forecasts for a particular locale. Using this capability, the team was able to predict with remarkable accuracy the snowfall totals in New York City during the mammoth snow storm that blanked the northeastern United States in February – and also to predict accurately when the snowfall would start and stop.</p>
<h2 class="p2"><strong>Blame It On Rio</strong></h2>
<p class="p1">The IBM Research team is putting their algorithms to work on behalf of cities around the world. For instance, Rio de Janeiro, because of its climate and terrain, has recurring flooding and landslide problems in many hilly neighborhoods. The researchers used data describing the physics of the atmosphere to create a mathematical model of how storms are likely to unfold in Rio. With it, they can predict up to 40 hours ahead of time how much rain will fall in a particular location — with 90% accuracy.</p>
<p class="p1">In recent months, the Deep Thunder team, lead by Lloyd Treinish, has developed new techniques for ingesting many more measurements from weather sensors. The team is also extending its technology to new applications, including agriculture and wind farming.</p>
<p class="p1">For detailed and super-accurate weather information to have maximum impact, it has to be accessible by a large number of people. That’s why IBM has created <a href="http://idealab.talkingpointsmemo.com/2012/03/ibm-showcases-deep-thunder-weather-forecasting-ipad-app.php">iPad and cloud applications that deliver the power of Deep Thunder</a> to people’s hands wherever they may be. Hopefully, by the time Rio hosts the summer Olympics in 2016, practically everybody who attends will be able to get their hands on Deep Thunder data so they know exactly what to expect when they venture out to the various game venues.<br /><br /><em>Note: This post originally appeared on <a href="http://asmarterplanet.com/blog/2013/02/23603.html" target="_blank">IBM's Smarter Planet blog</a>.&nbsp;IBM provided the storage hardware and software for the KMA project.&nbsp;</em><br /><br /><em>Image courtesy of <a href="http://www.shutterstock.com" target="_blank">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/28/how-big-data-can-boost-weather-forecasting</link>
                <guid>http://readwrite.com/2013/02/28/how-big-data-can-boost-weather-forecasting</guid>
                <category>Big data</category>
                <pubDate>Thu, 28 Feb 2013 13:53:24 -0800</pubDate>
                <author>Steve Hamm</author>
            </item>
                    <item>
                <title><![CDATA[VMware's Cloud-Based GemFire Makes It Easier To Work With Big Data]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/VMware_Gemfire.png" />
                                        <p class="p1"><a href="http://www.vmware.com/" target="_blank"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/vmware_300x60_contributed.jpg" style="" />
			</span>
</a></p>
<p class="p1">While companies of all sizes are struggling with the growth of information overload often referred to as “Big Data,” some IT developers and database deployers are approaching the challenge with a cloud-based service designed to make accessing mass amounts of data faster.</p>
<p class="p1">In short, they turn to <a href="http://www.vmware.com/products/application-platform/vfabric-gemfire">VMware’s vFabric GemFire</a>.</p>
<p class="p1">GemFire is a distributed in-memory data grid database software product that enables data distribution, data replication and partitioning (sharding), cashing data management at the exact moment the information is needed.</p>
<p class="p1"><strong>(See also <a href="http://readwrite.com/2013/02/20/whats-next-for-taming-big-data" target="_blank">What's Next For Taming Big Data</a>.)</strong></p>
<p class="p1">While the ability to move data from server to server and replicate it to more than one location has proven invaluable over the last 10 years, today's critical challenge is how can companies manage this data properly.</p>
<p class="p1">Over the past decade, <a href="http://www.vmware.com/files/pdf/vmware-vfabric-gemfire-distributed-main-memory-platform-WP-EN.pdf">GemFire</a> has helped companies:</p>
<ul class="ul1">
<li class="li2">Maintain simultaneous data connections over long distances.</li>
<li class="li2">Protect their data from natural and man-made disasters.</li>
<li class="li2">Maintain data reliability and availability, even when server hardware periodically fails.</li>
</ul>
<p class="p1">The software is able to achieve these goals by creating an object-oriented "data fabric" across a server cluster. It accesses copies of data that are stored in various locations as needed. To ensure compatibility with the latest <a href="http://readwrite.com/2012/04/06/8-reasons-why-cloud-computing">cloud configurations</a>, the management platform can spread the data across many virtual machines and GemFire servers to manage application objects.</p>
<p class="p1">But what does that mean in the real world? To find out, it helps to look at how <a href="http://www.vmware.com/go/gemfcomm">vFabric GemFire</a> is already working in key industrial applications, how it can be developed for new projects and how it can be deployed in a business network.</p>
<h2 class="p3">Passing Military Grade</h2>
<p class="p1">Keeping connected across town or around the globe is never more important than when national security is on the line. So when the <a href="file:///Users/fpaul/Documents/Stories/U.S.%20Defense%20Information%20Systems%20Agency">U.S. Defense Information Systems Agency (DISA)</a> needed to deal with up-to-the-minute information and awareness of military actions wherever they occur, the <a href="http://www.vmware.com/files/pdf/solutions/vFabric-GemFire-fo-Defense-and-Government-Agencies.pdf">agency chose vFabric GemFire</a>.</p>
<p class="p1">GemFire provided speed and the ability to easily increase and decrease the size of projects, but also a management tool that orchestrates data delivery from the back-end data stores to the consuming applications.</p>
<p class="p1">Since 2007, DISA has used GemFire for managing massive amounts of data for the various government agencies it supports, including U.S. military commands, joint task forces and the Pentagon. And because GemFire allows for a consistent view of data across all geographies and in different clusters, the military has reliable event notification, continuous querying, parallel execution, high throughput, low latency, high scalability, continuous availability and WAN distribution</p>
<h2 class="p3">South America Calling</h2>
<p class="p1">GemFire’s expertise at the middle data tier delivers reliability and critical data redundancy that keeps the information up to date even if one part of the network goes offline.</p>
<p class="p1">Take the case of a large telecommunications company in South America that sells prepaid phone cards via kiosks. The telecom uses GemFire to enable the sale and provisioning of pre-paid cards even when disconnected from the network. Because the country’s infrastructure is not 100% reliable, sometimes network data is not updated for several hours at a time and customers might not be able to use their cards. To overcome this obstacle, the telecom uses GemFire's distributed databases to maintain up-to-the-minute information.</p>
<p class="p1">vFabric GemFire was the optimal choice for managing a distributed database in this environment because it automatically recognizes systems and moves data around so that it remains accessible even on unreliable networks.</p>
<p class="p1">As VMware product line marketing manager Blake Connell put it, “vFabric GemFire automatically spreads the data over a wide network and accommodates network disruptions.</p>
<p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/VMW_10Q4_DGRM_vFabric_GemFire_Architecture_R4_800x600.jpg" style="" />
			</span>
</p>
<h2 class="p3">Capturing GemFire In The Enterprise</h2>
<p class="p1"><a href="http://www.vmware.com/files/pdf/techpaper/vmw-vfabric-gemFire-best-practices-guide.pdf">vFabric GemFire</a> is best suited for new Big Data projects that require NoSQL - or distributed unstructured data - models.</p>
<p class="p1">GemFire is well-designed for latency-sensitive applications such as virtualized environments that may require interrupt-moderation or interrupt-throttling - industry terms that IT developers and database deployers use when building a system that potentially doesn't take well to lags in data flow or processing.</p>
<p class="p1">Because GemFire is designed for data distribution, data replication, caching and data management, it has special requirements. For example, GemFire suggests enabling hyperthreading and keeping at least 50% of the server’s memory space available.</p>
<p class="p1">Configuring GemFire servers and regions is optimally done with the <a href="http://www.springsource.org/">Spring</a> object-oriented programming framework. This allows developers to centralize application service configuration instead of having to deal with Spring context configuration <em>plus</em> a separate cache.xml file.</p>
<p class="p1">For those working with structured data and who are knowledgable in SQL, VMware offers a related product called&nbsp;<a href="http://vmware.com/go/sqlfire">SQLFire</a>. SQLFire is&nbsp;a distributed SQL data-management platform. SQLFire will look familiar to SQL developers thanks to a similar interface and programming framework, and it allows the management of "not only SQL" databases much the way GemFire does.</p>
<p class="p1">Look for more information on the benefits of SQLFire in an upcoming ReadWrite post.</p>
<p class="p1"><a style="text-decoration: underline;" href="http://www.vmware.com/" target="_blank"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/vmware_300x60_contributed.jpg" style="" />
			</span>
</a></p>
<p class="p1">&nbsp;</p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/27/cloud-based-gemfire-makes-it-easier-to-work-with-big-data</link>
                <guid>http://readwrite.com/2013/02/27/cloud-based-gemfire-makes-it-easier-to-work-with-big-data</guid>
                <category>Taming Big Data</category>
                <pubDate>Wed, 27 Feb 2013 10:30:00 -0800</pubDate>
                <author></author>
            </item>
                    <item>
                <title><![CDATA[Data: Now A Differentiator, Soon A Commodity?]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_48640189.jpg" />
                                        <p>Tim O'Reilly once presciently <a href="http://oreilly.com/pub/a/web2/archive/what-is-web-20.html?page=3">described</a> data as "the new Intel Inside," the primary source of competitive differentiation in a world where technology has largely been commoditized. While he referenced Google and other web giants, today mainstream enterprises have embraced Big Data as they seek to stand out. But a danger lurks.</p>
<p>The more companies embrace data to differentiate, the less it does so. O'Reilly thought data might be humbled by free data movements, much as proprietary software was hit by open-source software, but the culprit may actually be something more overtly benign: data-friendly applications.</p>
<h2>Intel's Branding Coup</h2>
<p>There was a time when consumers didn't care what chips ran their computers, a fact painfully reflected in Intel's stock price. But in 1991, Intel launched its famous "Intel Inside" branding campaign, and <a href="https://www.google.com/finance?chdnp=1&amp;chdd=1&amp;chds=1&amp;chdv=1&amp;chvs=maximized&amp;chdeh=0&amp;chfdeh=0&amp;chdet=1361912400000&amp;chddm=484840&amp;chls=IntervalBasedLine&amp;q=NASDAQ%3AINTC&amp;ntsp=0&amp;fct=big&amp;ei=--4sUfCqK8W_rQGSWw">its stock took off</a>. What had been considered commodity in 1990 suddenly became premium in 1991: the Girbaud jeans of their time. <em>(Editor's note: Yes, we have pictures of Matt wearing Girbaud. No, they're not pretty.)</em></p>
<p>But O'Reilly wasn't arguing that data is simply a marketing slogan that can trick people into paying a premium for an otherwise commodity product. Instead, he reasoned that the few who manage to harness specialized databases are best positioned to charge for access to their data: "In the internet era, one can already see a number of cases where control over the database has led to market control and outsized financial returns." In turn, this control has enabled such firms to amass computing resources that, in turn, generate even more data (with subsequent lock-in).</p>
<p>But what happens when data goes mainstream?</p>
<h2>Big Data Inside</h2>
<p>This, after all, is what is happening within the enterprise. While we're still years away from Big Data becoming omnipresent, companies like Cloudera and EMC believe in a future when every enterprise mines vast treasure troves of data to glean insight and competitive advantage. For now, however, tools like Hadoop remain complex for most enterprises, and the science of data analysis has many enterprises scrambling for data scientist panaceas.</p>
<p>Big Data, however, promises to become easier, as Workday co-founder and Cloudera board member Aneel Bhusri reports:</p>
<blockquote class="twitter-tweet">
<p>.@<a href="https://twitter.com/mikeolson">mikeolson</a> makes a great point: <a href="https://twitter.com/search/%23hadoop">#hadoop</a> value will be delivered through cloud apps vendors.ISV opportunity huge for <a href="https://twitter.com/search/%23hadoop">#hadoop</a> and @<a href="https://twitter.com/cloudera">cloudera</a></p>
— aneel bhusri (@aneelb) <a href="https://twitter.com/aneelb/status/212736562742571009">June 13, 2012</a></blockquote>
<p>Bhusri is likely right. But if so - if Big Data becomes democratized (read: commoditized) through applications - how does it continue to set a data-driven company apart from its data-driven competitors?&nbsp;</p>
<h2>A Nicholas Carr Haunting</h2>
<p>This line of questioning will sound familiar to those who have read Nicholas Carr's seminal "<a href="http://www.roughtype.com/?p=644">Does IT Matter?</a>" As he wrote in 2007:</p>
<blockquote>Behind the change in thinking lies a simple assumption: that as IT’s potency and ubiquity have increased, so too has its strategic value. It’s a reasonable assumption, even an intuitive one. But it’s mistaken. What makes a resource truly strategic – what gives it the capacity to be the basis for a sustained competitive advantage – is not ubiquity but scarcity. You only gain an edge over rivals by having or doing something that they can’t have or do. By now, the core functions of IT – data storage, data processing, and data transport – have become available and affordable to all. Their very power and presence have begun to transform them from potentially strategic resources into commodity factors of production. They are becoming costs of doing business that must be paid by all but provide distinction to none.</blockquote>
<p>While a gaggle of enterprise IT vendors rushed to insist that IT does, in fact, matter, Carr's primary point - that the more the benefits of IT are distributed the less differentiating they become for any particular firm - seems to be confirmed by the effect of SaaS, among other things. IT has been simplified through SaaS and other trends, but it hasn't become more differentiating. If anything, it has become less so.</p>
<p>Is data any different?</p>
<h2>"Big" Data Gets It Wrong</h2>
<p>The answer is a qualified "maybe." Any particular technology trend loses its competitive bite when the mainstream adopts it, but this doesn't mean that there isn't value in harnessing that technology. Data is no different.</p>
<p>As Redmonk analyst <a href="https://twitter.com/monkchips/status/304664602959036417">James Governor postulates</a>, "The advantage is in <em>how</em> you use the tech, not the tech itself." Just as owning Salesforce.com or the latest HP server won't differentiate your business, neither will owning massive quantities of data. New <a href="http://pinterest.com/pin/233272455671103828/">survey data from Infochimps</a> confirms this: the top-two reasons for failure in Big Data analytics projects are lack of expertise to connect the dots between data and lack of business context for one's data.</p>
<p>The world has become fixated on the "big" in Big Data, but volume of data is not very interesting. In the (near) future, everyone will have data, and plenty of it. But asking the right questions at the right time, not merely asking "bigger" questions, will continue to <a href="http://readwrite.com/2013/02/21/tesla-and-the-fallacy-of-data-driven-decisions">drive serious competitive differentiation</a>.</p>
<p>Big Data, in other words, is just the ante to get in the game. Going forward, real differentiation will inure to those businesses that know which data to use and how and when to query it.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em>.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/27/data-once-a-differentiator-now-a-commodity</link>
                <guid>http://readwrite.com/2013/02/27/data-once-a-differentiator-now-a-commodity</guid>
                <category>Big data</category>
                <pubDate>Wed, 27 Feb 2013 06:44:00 -0800</pubDate>
                <author>Matt Asay</author>
            </item>
                    <item>
                <title><![CDATA[How The Mobile Enterprise Puts Business Leaders On The Hot Seat]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/Screen%20Shot%202013-02-26%20at%203.34.52%20PM.png" />
                                        <p class="p1"><em>Guest author Robert LeBlanc is Senior Vice President of IBM Software Group.</em></p>
<p class="p1">The era of the mobile enterprise has officially arrived. Half of American workers are now using smart devices for work as well as personal usage. The use of those devices is now at a critical mass and it's just the beginning.</p>
<p class="p1">Yet&nbsp;<a href="http://www.gartner.com/">Gartner</a>, a leading information technology research and advisory company, says few organizations plan and manage mobility with a truly strategic or proactive approach. They’re mostly reactive and tactical.</p>
<h2 class="p2">Mobility Isn't About The Device</h2>
<p class="p1">For enterprises, mobility shouldn’t be about the device. Instead, it needs to be about figuring out what an organization can do differently and better now that its employees and customers use mobile technologies so frequently at work and in their private lives, and can access processes and data anywhere and anytime.&nbsp;</p>
<p class="p1">Simply put, mobility changes everything for the enterprise. And that puts business leaders on the hot seat — forcing them to grapple with one of the biggest challenges of business today. Its a question of when, not if, mobile technology will impact the business.</p>
<p class="p1">A mobile enterprise is an organization built on a foundation of technologies and business processes that enables people to connect, share information and participate in the business processes no matter where they are. This facility allows businesses, employees and customers to better understand the world around them so they can make smarter and quicker decisions. And just as importantly, it helps them interact more effectively with all of their constituents.</p>
<h2 class="p3">How a Mobile Enterprise Works</h2>
<p><iframe src="http://www.youtube.com/embed/MGgPUQklrxI?rel=0" frameborder="0" width="420" height="315"></iframe></p>
<p class="p1">Mobility forces leaders to rethink how they operate their businesses, how they deal with employees and customers and how they manage their information and their technology. By its very nature, mobility makes it more difficult and challenging for IT leaders to control people and information. So they have to achieve a judicious balance between the need to loosen their hold on many aspects of their businesses with the need to assure the security and integrity of business processes and information.</p>
<p class="p1">What are these challenges?</p>
<p class="p1"><strong>How to operate the business?</strong>&nbsp;Many fundamental business processes were established in an era when companies tightly controlled every aspect of their operations. Mobility disrupts those linear flows of work and information. Now, business leaders have to restructure their business processes to take into account new kinds of interactions with customers, employees and business partners, and new sources of information. Example: How can a B-to-C company run an outstanding marketing program without taking into account the locations of customers and their real-time communications via social media?</p>
<p class="p1"><strong>How to interact with clients?</strong>&nbsp;Today, most businesses recognize the importance of their clients, but mobility and the emergence of big data add new elements to this calculus. CEOs understand that their most important assets are not just their employees but also the vast storehouses of information they possess and the day to day interaction with their clients. Leaders have to deal with the fact that, because of the mobility revolution, many of their most important clients have choices in who they interact with, when they interact and the type of interaction. It is imperative they take advantage of this new paradigm, embrace it, innovate around it and improve their client experiences.</p>
<p class="p1"><strong>How to manage employees?</strong>&nbsp;For all the talk about flat organizations and employee empowerment, many organizations still operate under the command-and-control management model. Today, thanks to mobility technologies, employees have the means to gather information and make decisions on the spot, and, increasingly, they’ll want to act. Leaders should empower them to exercise their judgment and creativity. Mobility is a great enabler. At the same time, though, organizations need to establish policies, practices and training programs that protect the company and its customers from undue risks.</p>
<p class="p1"><strong>How to manage information?</strong>&nbsp;Many companies keep their information in silos aligned with particular business units and functions. Most of what they gather sits in databases in rows and columns. But the coming era of big data means a tremendous amount of information of different types is now available—including unstructured data from sensors, video and Web pages. This information must be shared across the enterprise, and, naturally, it will be pushed and pulled via mobile technologies. Companies have to manage their information so it is easily accessible for those who need it and, at the same time, protected from unauthorized access.</p>
<p class="p1"><strong>How to manage technology?</strong>&nbsp;For established business, most of their technology was installed before mobility became such a big factor. It makes no sense to rip and replace it. Instead, companies should add-on capabilities that make it easy for the data and business processes managed in legacy computing systems to be available via mobile devices. New business processes and software applications should be developed with a “mobile first” mindset. That way, accessibility and security will be designed in from the start. Mobility should always be evaluated in the context of the other major technology shifts that are around it today, namely, cloud computing, data analytics and social business. These new capabilities are all game-changers individually, but together they can transform a business, making it more efficient, dynamic and productive.</p>
<p class="p1">Embracing change and the impacts of technology like mobility is a great opportunity for most businesses. Change is inevitable. Those that harness it and exploit it correctly will be leaders. But technology for technology's sake is not the full equation. The combination of technology along with new business processes, insights derived from data analytics and the evolving interaction among people is the game changer.</p>
<p class="p1"><em>Note: This post originally appeared on <a href="http://asmarterplanet.com/blog/2013/02/the-mobile-enterprise-puts-business-leaders-on-the-hot-seat.html" target="_blank">IBM's Building A Smarter Planet blog</a>. IBM has recently announced <a href="http://www-03.ibm.com/press/us/en/pressrelease/40403.wss" target="_blank">a new generation of mobile enterprise technologies</a> that are based on the point of view in this post.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/27/how-the-mobile-enterprise-puts-business-leaders-on-the-hot-seat</link>
                <guid>http://readwrite.com/2013/02/27/how-the-mobile-enterprise-puts-business-leaders-on-the-hot-seat</guid>
                <category>enterprise IT</category>
                <pubDate>Wed, 27 Feb 2013 04:00:00 -0800</pubDate>
                <author>Robert LeBlanc</author>
            </item>
                    <item>
                <title><![CDATA[Microsoft Completes Journey To Big Data Through Hadoop]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_bigdata.jpg" />
                                        <p>There's no beating around this bush. Today Hortonworks announced a new beta version of its Hadoop Data Platform that will <a href="http://hortonworks.com/about-us/news/hortonworks-brings-apache-hadoop-to-windows/" target="_blank">run on Microsoft Windows Server</a>, a move that shows Microsoft's own Big Data efforts will forever be connected to open source innovation.&nbsp;This is a highly significant – even expected – move in the big data sector, but also a very strange one.</p>
<p><a href="http://en.wikipedia.org/wiki/Apache_Hadoop" target="_blank">Hadoop</a>, of course, is an open-source software architecture that supports distributed computation jobs on huge data sets&nbsp;– in other words, classic Big Data work.&nbsp;Hortonworks, meanwhile, is one of the bigger Hadoop vendors in the market, even if that's more in terms of innovation than sales, where it trails Cloudera. Hortonworks founder and architect Arun Murthy is one of the original Hadoop coders who came out of Yahoo back in the day, and he also serves as the VP of the open source Apache Hadoop project at the Apache Software Foundation.</p>
<p>Which all means that any major platform move like this is sure to impact the rest of Hadoop development and, by extension, the rapidly growing Hadoop ecosystem that's driving much of the big data sector.</p>
<h2>Why Windows?</h2>
<p>Until today's announcement, Hadoop of any flavor typically ran on a Linux-based machine (physical or virtual). This made a lot of sense, since one of the big advantages of Hadoop is the capability to expand its data warehousing over any number of clustered computers. When those clustered machines are running Linux, it's all but frictionless to add more, both in in terms of licensing cost (which is free) and configuration (which is easy).</p>
<p>But when the underlying operating system is Windows Server, licensing&nbsp;– i.e., explicitly not free&nbsp;–&nbsp;would seem likely to create a lot more friction when someone tries to build a Hadoop cluster. Wouldn't using Windows Server as the OS for a Hadoop system be too expensive?</p>
<p>David McJannet, VP of marketing at Hortonworks, doesn't seem to think so. McJannet's concern was that too many Windows-based shops out there were shying away from Hadoop because they didn't want to deal with adding Linux clusters and the related hassle of managing them. So assuaging those concerns was one big reason Microsoft has been working with&nbsp;Hortonworks over the past 18 months.</p>
<p>The sheer number of Windows installations was also a major issue. McJannet said that a "majority of servers" were running Windows in the enterprise now. In its press release, Hortonworks cited IDC data thusly: "According to IDC, Windows Server owned 73 percent of the market in 2012 (IDC, <a style="line-height: 1.538em;" title="http://www.idc.com/getdoc.jsp?containerId=234339#.UStraKX7gqZ" href="http://www.idc.com/getdoc.jsp?containerId=234339#.UStraKX7gqZ">Worldwide and Regional Server 2012–2016 Forecast</a>, Doc # 234339, May 2012)."</p>
<p>It is not clear just what server class this 73 percent represents, since the report itself costs $4,500, and is thus a little hard to access. File servers? Application servers? It's sure not web servers, where <a title="http://news.netcraft.com/archives/2013/02/01/february-2013-web-server-survey.html" href="http://news.netcraft.com/archives/2013/02/01/february-2013-web-server-survey.html">according to Web analytics from Netcraft</a>, Microsoft currently has 16.93% of the marketshare, dwarfed by Apache's 55.26% marketshare.</p>
<p>McJannet also said Hadoop on Windows would make data exploration easier. Using SQL-based queries that can now directly integrate with the Hadoop Distributed File System (HDFS), products like SQL Server and Excel can tap straight into Hadoop-stored data, enabling end-users to more easily navigate vast stores of data in Hadoop clusters.</p>
<h2>Embracing Open Source</h2>
<p>This is not Hortonworks' first foray into Windows land. Late last year, it released the Windows Azure HDInsight product&nbsp;–&nbsp;essentially Hadoop for the Azure cloud platform.</p>
<p>As odd as it may seem to see Hadoop on Windows Server, the move makes a lot of sense from Microsoft's side. The company has needed a Big Data entry ever since it decided to drop its own Dryad data warehousing framework back in 2011. Some observers have expected this day ever since a year ago, when <a title="http://www.itworld.com/big-datahadoop/261056/microsoft-destined-follow-big-data" href="http://www.itworld.com/big-datahadoop/261056/microsoft-destined-follow-big-data">Microsoft announced it would build in tools within SQL Server to connect to Hadoop</a>.</p>
<p>McJannet emphasized that to date, Microsoft was playing well with others within the open source development model that Hadoop uses, so much of its innovation will cycle back to the rest of the Hadoop community.</p>
<p>If so, you can expect to see more Hadoop vendors to announce their own connections to Windows in the near future.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a><br /></em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/25/microsoft-completes-its-journey-to-hadoop</link>
                <guid>http://readwrite.com/2013/02/25/microsoft-completes-its-journey-to-hadoop</guid>
                <category>Big data</category>
                <pubDate>Mon, 25 Feb 2013 06:29:00 -0800</pubDate>
                <author>Brian Proffitt</author>
            </item>
                    <item>
                <title><![CDATA[Tesla And The Fallacy Of Data-Driven Decisions]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_125574617.jpg" />
                                        <p>We like to pride ourselves on being increasingly data-driven. In fact, we've created a giant new industry frenetically panning for Big Data gold. A healthy $4.5 billion market in 2010, according to <a href="http://www.idc.com/" target="_blank">IDC</a>, Big Data is set to explode to $23.8 billion in 2016, fueled by our need to be more data-driven in everything from how we do business to how we eat.</p>
<p>I suspect, however, that we're fooling ourselves, as the recent Tesla debacle suggests. As much as we'd like to smugly pat ourselves on the back for being data-driven, the truth is that data is always messy, and never really tells any particular story.</p>
<p><strong>(See also <a href="http://readwrite.com/2013/02/19/would-you-buy-a-tesla-model-s" target="_blank">Would You Buy A Tesla Model S?</a>)</strong></p>
<h2>Bigger Data&nbsp;≠ Bigger Answers</h2>
<p><em>New York Times</em> columnist <a href="http://www.nytimes.com/2013/02/19/opinion/brooks-what-data-cant-do.html?_r=0">David Brooks nails this</a> in an op-ed piece, wherein he argues that Big Data, while very useful for guiding our intuitions, gets some things very wrong. Like the value of social connections. Or the context for answering a question. In fact, he speculates, Big Data might actually obscure Big Answers by complicating decisions and making it even harder to determine which statistically signifiant correlations between data are informative and not simply spurious.</p>
<p>Such thinking won't be surprising to anyone that has read&nbsp;<a href="http://www.amazon.com/Black-Swan-Nassim-Nicholas-Taleb/dp/1400063515">Nassim Taleb's book&nbsp;</a><a href="http://www.amazon.com/Black-Swan-Nassim-Nicholas-Taleb/dp/1400063515"><em>The Black Swan</em></a>, which posits that the more data we analyze, the more likely our conclusions will be wrong. Taleb writes:</p>
<blockquote>In business and economic decision-making, data causes severe side effects - data is now plentiful thanks to connectivity; and the share of spuriousness in the data increases as one gets more immersed into it. A not well-discussed property of data: it is toxic in large quantities - even in moderate quantities.</blockquote>
<p>In other words, the more data you collect, the harder it can become to interpret that data. And even if you can interpret your data correctly, are you actually going to listen to that interpretation?</p>
<p>Which brings us to Tesla.&nbsp;</p>
<h2>Tesla and "Truth"</h2>
<p>In case you've been hiding under a rock, a&nbsp;<em>New York Times</em> reporter, John Broder, wrote an unflattering review of Tesla's new Model S. Tesla founder and CEO Elon Musk got the knives out and&nbsp;<a href="http://www.teslamotors.com/blog/most-peculiar-test-drive">slammed the reporter using a pile of data</a> (from the reporter's test drive, which is a little bit creepy). Broder <a href="http://wheels.blogs.nytimes.com/2013/02/14/that-tesla-data-what-it-says-and-what-it-doesnt/">responded</a>&nbsp;with his own view of the data, and finally <a href="http://publiceditor.blogs.nytimes.com/2013/02/18/problems-with-precision-and-judgment-but-not-integrity-in-tesla-test/" target="_blank">Margaret Sullivan, public editor of the&nbsp;<em>Times</em>, waded in</a>. Her conclusion?</p>
<blockquote>People will go on contesting these points – and insisting that they know what they prove — and that’s understandable. In the matter of the Tesla Model S and its now infamous test drive, there is still plenty to argue about and few conclusions that are unassailable.</blockquote>
<p>But wait! What about all that data Musk collected? Doesn't it <em>prove</em> his point? Or what about Broder's own data? Doesn't it <em>prove</em> his? In both cases the answer is "Yes," leaving would-be Tesla buyers like ReadWrite's <a href="http://readwrite.com/2013/02/19/would-you-buy-a-tesla-model-s">Dan Lyons stymied</a> as to what they should do. Which is why being "data-driven" is the <em>start</em> of a solution, not the end.&nbsp;</p>
<h2>The Human Side of Big Data</h2>
<p>As <a href="http://www.nytimes.com/2005/01/16/books/review/16COVERBR.html">David Brooks notes</a>, reviewing Malcolm Gladwell's book,&nbsp;<em><a href="http://www.gladwell.com/blink">Blink</a></em>, "We have the capacity to sift huge amounts of information, blend data, isolate telling details and come to astonishingly rapid conclusions, even in the first two seconds of seeing something." This is not to suggest that we shouldn't collect data, but that we perhaps need to be smarter about how we analyze it, and how much we trust it.</p>
<p>As I've argued&nbsp;(see <a href="http://readwrite.com/2013/02/11/big-data-and-the-landfills-of-our-digital-lives" target="_blank">Big Data And The Landfills Of The Digital Enterprise</a>), I don't think this is a matter of hiring expensive data scientists to interpret our data. Rather, I imagine it's a matter of guiding our decisions - even those split-second "hunches" that Gladwell talks about in <em>Blink</em> - through data, without becoming consumed with data. Data kicks off the right questions; data doesn't resolve disputes.</p>
<p>Just ask Musk and Broder: both absolutely convinced they're right, and both with ample data on their respective sides to prove it.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em>.</p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/21/tesla-and-the-fallacy-of-data-driven-decisions</link>
                <guid>http://readwrite.com/2013/02/21/tesla-and-the-fallacy-of-data-driven-decisions</guid>
                <category>Big data</category>
                <pubDate>Thu, 21 Feb 2013 06:37:18 -0800</pubDate>
                <author>Matt Asay</author>
            </item>
            </channel>
</rss>

