<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
        <channel>
        <title>Taming Big Data - ReadWrite</title>
        <link>http://readwrite.com</link>
        <description />
        <language>en</language>
        <copyright>Copyright 2012 SAY Media, Inc.</copyright>
        <managingEditor>readwriteweb@gmail.com</managingEditor>
        <docs>http://blogs.law.harvard.edu/tech/rss</docs> 
        <lastBuildDate>Wed, 27 Mar 2013 06:30:00 -0700</lastBuildDate>
        <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://rww.superfeedr.com/" />

                    <item>
                <title><![CDATA[Microsoft's Data Explorer: Picking Up Where Bing Leaves Off]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_dataexplore.jpg" />
                                        <p>Interacting with Big Data is daunting enough that, for most people, a search engine query is about as far as one is willing to go. But for those willing to get their hands dirty, Microsoft is quietly working towards fully integrating public data sources into Excel, eventually baking it into a future version.</p>
<p>This month, Microsoft shipped a <a href="http://www.microsoft.com/en-us/bi/Products/Office.aspx" target="_blank">"preview version" of Data Explorer</a>, a tool to integrate all sorts of data sources within Excel. Microsoft's vision is "self-service business intelligence," a fancy name to describe you and I &nbsp;accumulating data and performimg your own analysis on it.</p>
<p>Over time, according to&nbsp;Herain Oberoi, a director in Microsoft's business intelligence division, the goal will be to fully integrate Data Explorer into Excel. A year ago, Data Explorer was a lab project. "When things go from a lab to a preview, it's a sign that it has legs," he said.</p>
<p>It's a sign, Oberoi added, that Microsoft intends to ship the product as a long-term offering, "and in this case it would be Excel."</p>
<p>So why is this important?</p>
<p>In some cases, the questions we have require data - a lot of data. "How likely is it that I will find a job in Austin, as opposed to San Francisco?" is a question that boils down to, at its most basic, two comparisons: the unemployment rate within both cities. We've also been trained by search engines not to even hope for additional data that might make our answer even more valuable: if I'm a nurse, for example, I might like to know how many hospitals, hospices and clinics are in each town, the total number of beds, and even data for each city such as housing prices and the cost of living. You might even wonder where in each city a nurse, with a typical salary, could find the most house for the money.</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/Data%20explorer%20zoom.png" style="" />
			</span>
</p>
<p>Some of these answers are available. Cities, states and the Federal government compile statistics on unemployment, for example, and this <a href="http://www.bls.gov/oes/current/oes291111.htm">U.S. Department of Labor page</a>&nbsp;presents wage and employment data for nurses.&nbsp;&nbsp;Real-estate sites compile their own databases, but can also tap into public records and data sources, too.</p>
<p>That's where Data Explorer comes in. Within Excel 2013, downloading the Data Explorer tool allows users to tap into relational, structured and semi-structured data from OData, Hadoop and Azure Marketplace, among other sources. These sources are terrific for corporate data analysis, but perhaps a bit out of reach for consumers.</p>
<p>But it also allows Excel to pull data directly from the Web, including public Web pages like Wikipedia - you can even pull data from Facebook. (Microsoft provides a simple, easy-to-follow tutorial on its Web site on <a href="http://office.microsoft.com/en-us/excel-help/microsoft-data-explorer-preview-for-excel-101-HA103993784.aspx" target="_blank">how to add a Wikipedia page</a> covering the Euro soccer championship, and extract data from it.) One federally-maintained site that compiles all sorts of statistics is <a href="http://www.data.gov/" target="_blank">data.gov</a>, which was specifically designed to give the public access to high-quality, machine-readable datasets. Excel 2013 can handle millions of rows of data, using the new &nbsp;xVelocity in-memory engine.</p>
<p>Even better, if the maintainer of the data source updates the data, then the spreadsheet can be updated with a single click. Excel 2013 also contains nifty features like <a href="http://www.youtube.com/watch?v=ate6GDd1NSk" target="_blank">Flash Fill</a>, which automatically formats the data if it notices a pattern within the entries. Location data can be plotted against maps, supplied by Bing Maps, of course.</p>
<p>At this point, Oberoi said Microsoft feels pretty comfortable with identifying and facilitating the collection of data from public data sources; as well as "shaping" it, where text needs to be changed to numerical notations, columns need to be merged, and so on. It's the third goal: to take the data, shape it, visualize it, and share it out, where Microsoft needs to continue its work. When that's done, he said, Data Explorer should be fully integrated into Excel.</p>
<p>One of the issues that Microsoft is facing, however, is the continued improvement in natural language search to simply answer those questions. A few years ago, Google said that it would <a href="http://googleblog.blogspot.com/2009/04/adding-search-power-to-public-data.html" target="_blank">integrate and compare public data</a>, part of a response to the launch of Wolfram Alpha at the time. And Wolfram's not there yet - asking it to compare the <a href="http://www.wolframalpha.com/input/?i=what%20is%20the%20employment%20rate%20in%20Austin%20versus%20San%20Francisco&amp;t=crmtb01" target="_blank">unemployment rate of Austin and San Francisco</a> is within its grasp. Asking it a more nuanced question, such as the scenario above, relies on at least three factors: the availability of data, its ability to parse the query via natural language, and the ability to construct a meaningful solution. (Somewhat surprisingly, Bing presented a more comprehensive picture of the economies of both regions - not because of any inherent advantage in the search engine, but because the ongoing Silicon Valley-Austin employment spat justified the creation of a <a href="http://www.austinvssanfrancisco.com/economy/" target="_blank">Web site</a> comparing the two.)&nbsp;</p>
<p>Generally, the term "database" is enough to scare off the average joe. What Data Explorer could be, in a polished, final form, is a tool to allow Excel users to begin constructing their own advanced queries when a search engine can't do the job.</p>
<p><em>Lead image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2013/03/27/microsofts-data-explorer-picking-up-where-bing-leaves-off</link>
                <guid>http://readwrite.com/2013/03/27/microsofts-data-explorer-picking-up-where-bing-leaves-off</guid>
                <category>Microsoft</category>
                <pubDate>Wed, 27 Mar 2013 06:30:00 -0700</pubDate>
                <author>Mark Hachman</author>
            </item>
                    <item>
                <title><![CDATA[VMware's Cloud-Based GemFire Makes It Easier To Work With Big Data]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/VMware_Gemfire.png" />
                                        <p class="p1"><a href="http://www.vmware.com/" target="_blank"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/vmware_300x60_contributed.jpg" style="" />
			</span>
</a></p>
<p class="p1">While companies of all sizes are struggling with the growth of information overload often referred to as “Big Data,” some IT developers and database deployers are approaching the challenge with a cloud-based service designed to make accessing mass amounts of data faster.</p>
<p class="p1">In short, they turn to <a href="http://www.vmware.com/products/application-platform/vfabric-gemfire">VMware’s vFabric GemFire</a>.</p>
<p class="p1">GemFire is a distributed in-memory data grid database software product that enables data distribution, data replication and partitioning (sharding), cashing data management at the exact moment the information is needed.</p>
<p class="p1"><strong>(See also <a href="http://readwrite.com/2013/02/20/whats-next-for-taming-big-data" target="_blank">What's Next For Taming Big Data</a>.)</strong></p>
<p class="p1">While the ability to move data from server to server and replicate it to more than one location has proven invaluable over the last 10 years, today's critical challenge is how can companies manage this data properly.</p>
<p class="p1">Over the past decade, <a href="http://www.vmware.com/files/pdf/vmware-vfabric-gemfire-distributed-main-memory-platform-WP-EN.pdf">GemFire</a> has helped companies:</p>
<ul class="ul1">
<li class="li2">Maintain simultaneous data connections over long distances.</li>
<li class="li2">Protect their data from natural and man-made disasters.</li>
<li class="li2">Maintain data reliability and availability, even when server hardware periodically fails.</li>
</ul>
<p class="p1">The software is able to achieve these goals by creating an object-oriented "data fabric" across a server cluster. It accesses copies of data that are stored in various locations as needed. To ensure compatibility with the latest <a href="http://readwrite.com/2012/04/06/8-reasons-why-cloud-computing">cloud configurations</a>, the management platform can spread the data across many virtual machines and GemFire servers to manage application objects.</p>
<p class="p1">But what does that mean in the real world? To find out, it helps to look at how <a href="http://www.vmware.com/go/gemfcomm">vFabric GemFire</a> is already working in key industrial applications, how it can be developed for new projects and how it can be deployed in a business network.</p>
<h2 class="p3">Passing Military Grade</h2>
<p class="p1">Keeping connected across town or around the globe is never more important than when national security is on the line. So when the <a href="file:///Users/fpaul/Documents/Stories/U.S.%20Defense%20Information%20Systems%20Agency">U.S. Defense Information Systems Agency (DISA)</a> needed to deal with up-to-the-minute information and awareness of military actions wherever they occur, the <a href="http://www.vmware.com/files/pdf/solutions/vFabric-GemFire-fo-Defense-and-Government-Agencies.pdf">agency chose vFabric GemFire</a>.</p>
<p class="p1">GemFire provided speed and the ability to easily increase and decrease the size of projects, but also a management tool that orchestrates data delivery from the back-end data stores to the consuming applications.</p>
<p class="p1">Since 2007, DISA has used GemFire for managing massive amounts of data for the various government agencies it supports, including U.S. military commands, joint task forces and the Pentagon. And because GemFire allows for a consistent view of data across all geographies and in different clusters, the military has reliable event notification, continuous querying, parallel execution, high throughput, low latency, high scalability, continuous availability and WAN distribution</p>
<h2 class="p3">South America Calling</h2>
<p class="p1">GemFire’s expertise at the middle data tier delivers reliability and critical data redundancy that keeps the information up to date even if one part of the network goes offline.</p>
<p class="p1">Take the case of a large telecommunications company in South America that sells prepaid phone cards via kiosks. The telecom uses GemFire to enable the sale and provisioning of pre-paid cards even when disconnected from the network. Because the country’s infrastructure is not 100% reliable, sometimes network data is not updated for several hours at a time and customers might not be able to use their cards. To overcome this obstacle, the telecom uses GemFire's distributed databases to maintain up-to-the-minute information.</p>
<p class="p1">vFabric GemFire was the optimal choice for managing a distributed database in this environment because it automatically recognizes systems and moves data around so that it remains accessible even on unreliable networks.</p>
<p class="p1">As VMware product line marketing manager Blake Connell put it, “vFabric GemFire automatically spreads the data over a wide network and accommodates network disruptions.</p>
<p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/VMW_10Q4_DGRM_vFabric_GemFire_Architecture_R4_800x600.jpg" style="" />
			</span>
</p>
<h2 class="p3">Capturing GemFire In The Enterprise</h2>
<p class="p1"><a href="http://www.vmware.com/files/pdf/techpaper/vmw-vfabric-gemFire-best-practices-guide.pdf">vFabric GemFire</a> is best suited for new Big Data projects that require NoSQL - or distributed unstructured data - models.</p>
<p class="p1">GemFire is well-designed for latency-sensitive applications such as virtualized environments that may require interrupt-moderation or interrupt-throttling - industry terms that IT developers and database deployers use when building a system that potentially doesn't take well to lags in data flow or processing.</p>
<p class="p1">Because GemFire is designed for data distribution, data replication, caching and data management, it has special requirements. For example, GemFire suggests enabling hyperthreading and keeping at least 50% of the server’s memory space available.</p>
<p class="p1">Configuring GemFire servers and regions is optimally done with the <a href="http://www.springsource.org/">Spring</a> object-oriented programming framework. This allows developers to centralize application service configuration instead of having to deal with Spring context configuration <em>plus</em> a separate cache.xml file.</p>
<p class="p1">For those working with structured data and who are knowledgable in SQL, VMware offers a related product called&nbsp;<a href="http://vmware.com/go/sqlfire">SQLFire</a>. SQLFire is&nbsp;a distributed SQL data-management platform. SQLFire will look familiar to SQL developers thanks to a similar interface and programming framework, and it allows the management of "not only SQL" databases much the way GemFire does.</p>
<p class="p1">Look for more information on the benefits of SQLFire in an upcoming ReadWrite post.</p>
<p class="p1"><a style="text-decoration: underline;" href="http://www.vmware.com/" target="_blank"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/vmware_300x60_contributed.jpg" style="" />
			</span>
</a></p>
<p class="p1">&nbsp;</p>
                    ]]></description>
                <link>http://readwrite.com/2013/02/27/cloud-based-gemfire-makes-it-easier-to-work-with-big-data</link>
                <guid>http://readwrite.com/2013/02/27/cloud-based-gemfire-makes-it-easier-to-work-with-big-data</guid>
                <category>Taming Big Data</category>
                <pubDate>Wed, 27 Feb 2013 10:30:00 -0800</pubDate>
                <author></author>
            </item>
            </channel>
</rss>

