<?xml version="1.0" encoding="UTF-8" ?>
<rss xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">
        <channel>
        <title>Data Services - ReadWrite</title>
        <link>http://readwrite.com</link>
        <description />
        <language>en</language>
        <copyright>Copyright 2012 SAY Media, Inc.</copyright>
        <managingEditor>readwriteweb@gmail.com</managingEditor>
        <docs>http://blogs.law.harvard.edu/tech/rss</docs> 
        <lastBuildDate>Thu, 29 Nov 2012 10:48:00 -0800</lastBuildDate>
        <atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="hub" href="http://rww.superfeedr.com/" />

                    <item>
                <title><![CDATA[Amazon's Redshift Accelerates Data Warehouse As A Service ]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/fields/shutterstock_redshift.jpg" />
                                        <p>Amazon continues to play the role of disrupter in the data marketplace, announcing a new data warehousing service that could blow the doors off existing data warehousing vendors in terms of price. The question is, will lower prices be enough to change the game?</p>
<p>The announcement of the new <a title="http://aws.amazon.com/redshift" href="http://aws.amazon.com/redshift">Amazon Redshift</a> service at yesterday's <a href="https://reinvent.awsevents.com/" target="_blank">Amazon Web Service re:Invent conference</a> was one of those "known unknowns" former Secretary of Defense Donald Rumsfeld used to go on about. We knew AWS would want to lead big with something in it's first-ever live conference being held in Las Vegas this week; we just didn't know what it would be. Now that the cat's out of the bag, many analysts seem pretty excited about the prospect. But Redshift also has some red flags.</p>
<h2>Redshift 101</h2>
<p>Let's take a look and see what's under the Redshift hood.</p>
<p>Amazon.com CTO <a title="http://www.allthingsdistributed.com/2012/11/amazon-redshift.html" href="http://www.allthingsdistributed.com/2012/11/amazon-redshift.html">Werner Vogels lauds Amazon Redshift</a> as "a fast and powerful, fully managed, petabyte-scale data warehouse service in the cloud."</p>
<p>Kinda buzzwordy, but the key terms in that statement are "petabyte-scale" - which means this service is going to be easy to grow into if your data needs <a title="http://readwrite.com/2012/11/23/peta-exa-yotta-and-beyond-big-data-reaches-cosmic-proportions-infographic" href="http://readwrite.com/2012/11/23/peta-exa-yotta-and-beyond-big-data-reaches-cosmic-proportions-infographic">ever get that insanely high</a> - and "service in the cloud" - a statement that means this is a hosted service on AWS's public cloud infrastructure - with all of the risks and rewards that come with that situation.</p>
<p>Vogels gets a little more specific later in his blog:</p>
<blockquote>"Redshift has a massively parallel processing (MPP) architecture, which enables it to distribute and parallelize queries across multiple low cost nodes. The nodes themselves are designed specifically for data warehousing workloads. They contain large amounts of locally attached storage on multiple spindles and are connected by a minimally oversubscribed 10 Gigabit Ethernet network. This configuration maximizes the amount of throughput between your storage and your CPUs while also ensuring that data transfer between nodes remains extremely fast."</blockquote>
<p>The MPP architecture is very important, because it gives some insights into Redshift's origins. Redshift is a columnar-based relational database that seems to based on the open source PostgreSQL database - a hot commodity in the open source world that's been making inroads against the venerable MySQL database partly because PostgreSQL handles parallelism so well.</p>
<p>All the bits Vogel mentions about the oversubscribed network connections are critical, too, because if his claims are right, this means that Redshift will be fast. The architecture of this new service is also important, because it means that unlike Hadoop, where data just sits cheaply waiting to be batch processed, data stored in Redshift can be worked on fast - fast enough for even transactional work.</p>
<p>Latency will be one of only a few areas any competing vendor will be able to go after - because the competition certainly can't touch AWS on price.</p>
<h2>Redshift's Pricing Shift</h2>
<p>One of the big parts of the Redshift announcement message yesterday was very much about price: just buying on-demand data capacity costs $3,723 per terabyte (TB) annually, which sounds like a lot except when you know how much traditional on-site data warehousing solution can run. In his re:Invent keynote yesterday, senior vice president of AWS Andy Jassy claimed such solutions can run $19,000 to $25,000/TB a year. So right off the bat, if Redshift is indeed offering comparable service, customers will save 80-85% off their data warehousing bill.</p>
<p>But wait, there's more. If customers reserve three years of service, the price drops to a jaw-dropping $999/TB annual fee. That's a 95-96% reduction in potential costs for data storage.</p>
<p>In this case, disruptive may not have been hyperbole. It may have been an understatement.</p>
<h2>Redshift's Potential Issues</h2>
<p>On paper, this sounds pretty good, but there are some potential issues that should be raised. For one, this is a public cloud service, which means your data will be out past your corporate firewall and in some ways sitting outside of your control. If one of <a title="http://readwrite.com/2012/07/05/internet-outage-last-weekend-was-preventable" href="http://readwrite.com/2012/07/05/internet-outage-last-weekend-was-preventable">Amazon's data centers has a hiccup</a>, you could be out of luck.</p>
<p>The public status also means you'd better have bandwidth costs, security and infrastructure figured out, because somehow your company is going to have to get its data out and back to that cloud in a timely and safe manner</p>
<p>One wildcard with this new Redshift service is how easily it will be to build apps or convert existing apps to work with it. Amazon's APIs are open, but only to the point that you can point your software to them. Once you invest in Amazon's APIs, it will be more painful to pull out to another cloud-based service should you decide to down the road.</p>
<p>If you have been keeping your data and applications local, shifting to Redshift could also mean shifting your applications to some other part of the AWS ecosystem as well, just to keep the latency times and bandwidth costs reasonable. In some ways, Redshift may be the AWS equivalent of putting the milk in the back of the grocery store.</p>
<p>If it is at all reasonable in its service, though, Redshift's pricing will definitely put pressure on the data warehousing vendors to lower their prices to compete - good news for anyone looking at data warehousing.</p>
<p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a>.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2012/11/29/redshift-accelerates-data-warehouse-as-a-service</link>
                <guid>http://readwrite.com/2012/11/29/redshift-accelerates-data-warehouse-as-a-service</guid>
                <category>Amazon</category>
                <pubDate>Thu, 29 Nov 2012 10:48:00 -0800</pubDate>
                <author>Brian Proffitt</author>
            </item>
                    <item>
                <title><![CDATA[How Tech Protects The President: Audio Analysis]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/files/fields/gun.jpg" />
                                        <p>New audio technology can pinpoint the location and precise time of a gunshot -- even the type of gun that fired it -- up to 2,000 yards away.&nbsp;</p>
<p><a href="http://www.shotspotter.com/" target="_blank">Shotspotter</a> is that technology, and the third and final installment in this series on the technology protecting the President.&nbsp;</p>
<p>SST Inc., the Mountain View, Calif., company behind the product, installed its sensors&nbsp;on buildings around&nbsp;<a href="http://www.nationaljournal.com/daily/cops-cameras-and-copters-keep-eyes-on-charlotte-20120902" target="_blank">the Democratic National Convention</a> in Charlotte, N.C. No incidents were reported, but that just may mean the system is working. Proponents of the product claim that it wards off as much crime as it detects. &nbsp;&nbsp;</p>
<p>Here's how it works: Shotspotter antennae and sensors activate when the sound of a gunshot is detected. They triangulate the noise, measuring its intensity, to find the source, and then alert police.&nbsp;The system can even determine what kind of gun was used and the precise time it was fired. Wow.&nbsp;</p>
<p>More than 70 departments use the system already.&nbsp;In most cases, officers say that just having the system quickly curbs criminals from even committing a gun crime, because of the high rate of detection.</p>
<p>Basically it makes crooks think twice before taking a shot. While it's hard to track these stats nationwide, some numbers we did find seem to support the system's effectiveness.&nbsp;In Rochester, N.Y., gunfire was reduced 43% after Shotspotter went live; and in Minneapolis, that city saw a 30% decrease in gun shots reported within the first 30 days of introducing Shotspotter.</p>
<p>These systems are at work in more than 60 U.S. cities and cost between $40,000 and $60,000 per square mile to set up and run.&nbsp;</p>
<p>&nbsp;</p>
<p><em>Photo by </em><a href="http://www.flickr.com/photos/theknowlesgallery/" target="_blank"><em>The Knowles Galler</em>y</a></p>
                    ]]></description>
                <link>http://readwrite.com/2012/10/16/how-tech-protects-the-president-audio-analysis</link>
                <guid>http://readwrite.com/2012/10/16/how-tech-protects-the-president-audio-analysis</guid>
                <category>Data Services</category>
                <pubDate>Tue, 16 Oct 2012 06:47:51 -0700</pubDate>
                <author>Adam Popescu</author>
            </item>
                    <item>
                <title><![CDATA[Tech That Protects The President: Image Analysis ]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/files/fields/3206119933_b09f769d7a_z.jpg" />
                                        <p>"Believe nothing you hear, and only one half that you see," Edgar Allan Poe wrote in his 1845 masterpiece "The Tell-Tale Heart." Almost 170 years later,&nbsp;imaging software with artificial intelligence capabilities&nbsp;is making it easier to believe <em>all</em> of what a camera sees. <em>(Part 2 of a 3-part series on technology employed by White House security forces.)</em></p>
<p>AISight is a program created by Houston-based <a href="http://www.brslabs.com/aisight" target="_blank">BRS Labs</a>, which uses <a href="http://www.brslabs.com/what-is-behavioral-analytics" target="_blank">behavioral recognition</a> techniques to observe, learn and respond to video input. Once the software is implemented, it spends its time training and defining baseline behavior and patterns of people and places, understanding what's common and uncommon in a field of view and a period of time, down to days of the week and hours of the day.&nbsp;Once it has defined normal, it goes to work recording 5- to 10-second clips and looking for uncommon behavior, such as someone standing outside of an ATM at 4 a.m. on a Sunday, as opposed to normal stop-and-go ATM traffic throughout the week. When it sees abnormal behavior,&nbsp;sends out an alert.</p>
<h3>See also: <a href="http://www.readwriteweb.com/archives/how-tech-protects-the-president-data-mining.php">Tech That Protects The President, Part 1: Data Mining</a></h3>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">The technology was used during the Republican National Convention in Tampa, Florida, in August, and has been implemented in several other cities throughout the U.S. In Tampa, the system was installed months before the event actually happened so AISight could learn about local patterns of behavior. Of course, when the event happened, far more people showed up, but the system was prepared to recognize baseline patterns and deviations from them.&nbsp;</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">"Our software learns," explained&nbsp;David Gerulski, vice president of marketing for BRS.&nbsp;"It's got AI intelligence and it learns like a human does."</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">The system captured notable events in Tampa, Gerulski acknowledged,&nbsp;but he would not reveal the details. AISight was analyzing the output of cameras positioned throughout Tampa. And it sent more false alarms than alerts to actual incidents at the convention. But with thousands of people present and new permutations at work there, that's not too surprising, Gerulski said: Better safe than sorry.</p>
<h2 style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">Real-Time Response</h2>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">One of the technology's biggest benefits that it enables law-enforcement forces to respond in real time, Gerluski says. Without AISight or similar technology, security officials find out about an incident only after the fact, and usually after a tired pair of eyes scans hours of tape.&nbsp;</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">&nbsp;<span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/offical%2520screenshot%25203.jpg" style="" />
			</span>
 </p>
<p>Basically the software "expands the eyes looking at all those cameras," Gerluski says, changing the surveillance model without altering the infrastructure. "You could have 1,000&nbsp;screens, no human could look at all the camera views. When something odd happens at the view, we can pop that out."&nbsp;Human attention is no longer needed to constantly monitor cameras and can be deployed elsewhere. Officers in the field can receive mobile alerts and video clips while an event is in progress.</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">AISight, which relies on&nbsp;<a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;p=1&amp;u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&amp;r=1&amp;f=G&amp;l=50&amp;co1=AND&amp;d=PTXT&amp;s1=8131012.PN.&amp;OS=PN/8131012&amp;RS=PN/8131012%20%20" target="_blank">patented behavioral analytics</a>,&nbsp;can be implemented on top of existing camera systems. No new hardware need be installed.&nbsp;However, this kind of capability doesn't come cheap. The system costs hundreds of thousands of dollars, into the millions depending on the number of cameras the software is watching.</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">BRS wouldn't&nbsp;confirm or deny whether its software is currently being used by the White House.</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">For more details about AISight's behavioral recognition system, take a look at this video:</p>
<p><iframe src="http://www.youtube.com/embed/C9KJuzXD1-4" frameborder="0" width="560" height="315"></iframe></p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;"><a href="http://www.readwriteweb.com/archives/how-tech-protects-the-president-data-mining.php"><em>Part 1: Data Mining</em></a></p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;"><em>Part 2: Image Analysis</em></p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;"><em>Photos by <a href="http://www.flickr.com/photos/myfoxmemphis/" target="_blank">myfoxmemphis</a>&nbsp;and <a href="http://www.flickr.com/photos/dno1967b/" target="_blank">Daniel Oines</a></em></p>
                    ]]></description>
                <link>http://readwrite.com/2012/10/15/tech-that-protects-the-president-image-analysis</link>
                <guid>http://readwrite.com/2012/10/15/tech-that-protects-the-president-image-analysis</guid>
                <category>Data Services</category>
                <pubDate>Mon, 15 Oct 2012 04:00:00 -0700</pubDate>
                <author>Adam Popescu</author>
            </item>
                    <item>
                <title><![CDATA[Tech That Protects The President, Part 1: Data Mining]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/files/fields/tech-that-protects-the-president-data-mining-top.png" />
                                        <p>President Obama's appearance at the Democratic National Convention in September took place amid a rat's nest of perils. But the local Charlotte, North Carolina, police weren't entirely on their own. They were aided by a sophisticated data mining system that helped them identify threats and react to them quickly. <em>(Part 1 of a 3-part series about the technology behind presidential security.)</em></p>
<p>The Charlotte-Mecklenberg police used a software from lxReveal to monitor the Internet for associations between Obama, the DNC, and potential treats.&nbsp;The company's program, known as <a href="http://www.ureveal.com/">uReveal</a>, combs news articles, status updates, blog posts, discussion forum comments. But it&nbsp;doesn't simply search for keywords. It works on concepts defined by the user and uses natural language processing to analyze plain English based on meaning and context, taking into account slang and sentiment. If it detects something amiss, the system sends real-time alerts.</p>
<p>"We are able to read and alert almost as fast as [information] comes on the Web, as opposed to other systems where it takes hours," said Bickford, vice president of operations of IxReveal.</p>
<p>In the past, this kind of task would have required large numbers of people searching and then reading huge volumes of information and manually highlighting relevant references.&nbsp;"Normally you have to take information like an email and shove it in to a database," Bickford explained. "Someone has to physically&nbsp;read it or do a keyword search.</p>
<p>uReveal, on the other hand, lets machines do the reading, tracking, and analysis. "If you apply our patented technology and natural language processing capability, you can actually monitor that information for specific keywords and phrases based on meaning and context," he says.&nbsp;The software can differentiate between a Volkswagen bug, a computer bug and an insect bug, Bickford explained - or, more to the point, between a reference to fire from a gun barrel and on to fire in a fireplace.</p>
<p>Bickford says the days of people slaving over sifting through piles of data, or&nbsp;<a href="http://en.wikipedia.org/wiki/Extract,_transform,_load" target="_blank">ETL</a>&nbsp;(extract, transform and load) data processing capabilities are over. "It's just not supportable."</p>
<h2>Finding Patterns &amp; Discovering Clusters</h2>
<p>Once the system understands what people are saying, it performs behavioral analytics to detect patterns in interactions among people and identify clusters of people who follow particular patterns. "You look at how the community is changing," Bickford explained, "like flash mobs forming."</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">The system doesn't profile individuals based on demographics or&nbsp;personal identifiers. "We dont need name, race, or gender to link info togetrhter, which is what used to be done. The analysis is based on meaning and context."</p>
<h2 style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">Managing A Proliferation Of Data</h2>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">The U.S. president isn't the only one who benefits from uReveal. More than thirty police departments and multiple <a href="http://www.ureveal.com/Government" target="_blank">government</a> agencies use it to track behavior, detect fraud, and monitor regulatory compliance. In the private sector, major clients include the Bill and Melinda Gates Foundation; the service monitors threats to both the Gates' and their foundation.</p>
<p>And Bickford has a broader range of organizations in his sights. Systems like uReveal are essential to the task of gathering intelligence - not just law enforcement but business intelligence - in an age of proliferating data. "Organizations, whether government of business, will deal with [increasing amounts of] information in the future," Bickford pointed out. "We're trying to make analystics and intelligence a commodity, so everyone can get what they want. Our tech has taken a huge step in that direction."</p>
<p>The software costs as little as $5,000 for a small sheriff department and as much as half a million dollars for a government agency.&nbsp;</p>
<p style="margin-top: 1em; margin-right: 0px; margin-bottom: 1em; margin-left: 0px;">Check out the video below to learn more about how uReveal works.&nbsp;</p>
<p><iframe src="http://www.youtube.com/embed/tczGNs-OqnI" frameborder="0" width="560" height="315"></iframe></p>
<p><em>Next: image analysis.</em>&nbsp;</p>
<p>&nbsp;</p>
<p><em>Photo by <a href="http://www.flickr.com/photos/jurvetson/" target="_blank">Steve Jurvetson</a></em></p>
<p>&nbsp;</p>
                    ]]></description>
                <link>http://readwrite.com/2012/10/12/how-tech-protects-the-president-data-mining</link>
                <guid>http://readwrite.com/2012/10/12/how-tech-protects-the-president-data-mining</guid>
                <category>Data Services</category>
                <pubDate>Fri, 12 Oct 2012 05:00:00 -0700</pubDate>
                <author>Adam Popescu</author>
            </item>
                    <item>
                <title><![CDATA[What Do IT Outsourcing Companies Know About Innovation?]]></title>
                <description><![CDATA[
                                        <p class="p1"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/Gautam%2520Shroff.jpg" style="" />
			</span>
Most observers think of the big multinational technology outsourcing firms - especially the ones based in India - as a reliable source of relatively inexpensive technology expertise for routine IT projects. Not surprisingly, those firms desperately want to move up the food chain and become known for innovation as well.</p>
<p class="p1">A conversation with&nbsp;Dr. Gautam Shroff, Vice President at Tata Consultancy Services (TCS)&nbsp;and head of the TCS Technology Innovation Lab in Delhi, reveals that they’re making progress, but that they still have a ways to go.</p>
<p class="p1"><a href="http://www.tata.com/"><span class="s1">Tata</span></a>, of course, is a huge collection of more than 100 companies across 6 continents, including everything from car makers to consultants, chemicals and consumer products. And this year, TCS&nbsp;<a href="http://www.tata.com/media/releases/inside.aspx?artid=JtyWIz2J2/Q="><span class="s1">opened a facility in Silicon Valley</span></a>.</p>
<h2 class="p1">Innovation is complicated</h2>
<p class="p1">Shroff acknowledged that firms like TCS are not known for innovation, but said the picture was more complicated than that. There are two sides of innovation, Shroff said, creating ideas, and getting those ideas out into the world to create solutions out of them. TCS, he said, has done a lot of work on the latter side.</p>
<p class="p1">The company has been investing in research since 1981, he said, and now has the largest academic computer science effort in India. Those efforts have contributed to significant businesses, but mostly for Tata itself. Shroff said that in the 1990s, TCS research created software development tools that led to the company’s entire financial product business, as well as the only end-to-end cloud business in India, with hundreds of small and midsize business customers.</p>
<p class="p1">Most of Tata’s R&amp;D isn’t productized, though, Shroff said. Instead of producing “great science,” it’s used for practical, incremental business improvements within Tata’s activities for its customers. “We also innovate for our customers where we have replicable [innovations]," Shroff said. “We just don’t call them products.”</p>
<p class="p1">That’s useful, certainly, but not exactly what most observers think of when they hear the word “innovation.” So, how exactly is Tata moving toward more innovation in its offerings?</p>
<h2 class="p1">Trying to get smart about business intelligence</h2>
<p class="p1">The key areas Tata is focusing on include social media, cloud computing, mobility and big data. And for Shroff, those all come together in business intelligence, which he sees reaching an important inflection point that requires major changes - fusing deep analysis of big data from both inside and outside the enterprise, and looking for new patterns and correlations.</p>
<p class="p1">As an example, he cited companies that are monitoring Twitter streams to identify “adverse events” that might not reach news outlets but could still impact business operations. “If that matters to you, it’s better to know now, so you can alert people in the field on how it’s likely to affect their business,” Shroff explained.</p>
<p class="p1">“People are looking at it with great curiosity in the business world,” Shroff said, “exploring how more data can improve the business. What was traditionally a niche market can now be a force … something the CEO needs to know about. And we are right in the middle of that.”&nbsp;While Shroff wouldn’t name individual BI customers, he said Tata is working with firms in the retail, consumer packaged goods and financial services markets on the consumer and supply sides of their businesses.</p>
<p class="p1">Finally, Shroff said opening a facility in Silicon Valley has given Tata access to a new talent pool. “We’re now able to get people who work in startups who don’t want to leave the valley.” The company also uses the outpost to work with academics at Stanford University and UC Berkeley and partner with startups that are developing useful technologies.</p>
                    ]]></description>
                <link>http://readwrite.com/2012/05/08/what-do-it-outsourcing-companies-know-about-innovation</link>
                <guid>http://readwrite.com/2012/05/08/what-do-it-outsourcing-companies-know-about-innovation</guid>
                <category>Big data</category>
                <pubDate>Tue, 08 May 2012 15:00:00 -0700</pubDate>
                <author>Fredric Paul</author>
            </item>
                    <item>
                <title><![CDATA[Improvements in New York Times' Fech Makes It Easier to Follow the Money]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/files/fields/money-610-1.png" />
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/fields/money-610-1.png" style="" />
			</span>
Having data available electronically is not the same thing as the data being <em>useful</em>. Campaign finance disclosures provided electronically by the Federal Elections Commission (FEC), are a good example of that. The <em>New York Times</em>'s <a href="http://open.blogs.nytimes.com/2011/08/29/introducing-fech/">Fech</a> (not "fetch") is a RubyGem - a packaged application - designed to help journalists and public interest organizations access and make sense of FEC filings.</p>
<p>&nbsp;Here's the <em>NY Times'</em> description of Fech from its first release last year:</p>
<blockquote>
<p>Journalists who work with these filings need to extract their data from complex text files that can reach hundreds of megabytes. Turning a new set into usable data involves using the F.E.C.'s data dictionaries to match all the fields to their positions in the data. But the available fields have changed over time, and subsequent versions don't always match up. For example, finding a committee's total operating expenses in version 7 means knowing to look in column 52 of the “F3P” line. It used to be found at column 50 in version 6, and at column 44 in version 5. To make this process faster, my co-intern Evan Carmi and I created a library to do that matching automatically.</p>
<p>Fech (think “F.E.C.h,” say “fetch”), is a Ruby gem that abstracts away any need to map data points to their meanings by hand. When you give Fech a filing, it checks to see which version of the F.E.C.'s software generated it. Then, when you ask for a field like “total operating expenses,” Fech knows how to retrieve the proper value, no matter where in the filing that particular software version stores it.</p>
</blockquote>
<p>Derek Willis of the <em>NY Times</em> announced the <a href="http://open.blogs.nytimes.com/2012/04/11/announcing-fech-1-0/">1.0 release of Fech</a> last month. This release covers "<a href="http://nytimes.github.com/Fech/#row_types">all of the current form types that candidates and committees submit</a>." Perhaps most importantly, this release <a href="http://nytimes.github.com/Fech/#row_types">allows comparing two filings against one another</a>.</p>
<h2>Why Fech Matters</h2>
<p>Fech is already being used by the NYT for its reporting and <a href="http://elections.nytimes.com/2012/campaign-finance/independent-expenditures/totals">interactive visualizations</a> of campaign spending. But that's just one editorial team. Putting this tool in the hands of any developer or reporter that wants to work with the data opens a lot more possibilities.</p>
<p>For example, there's <a href="http://www.propublica.org/">ProPublica</a>, which is using Fech and the <em>NY Times</em>' APIs for its <a href="http://www.propublica.org/article/campaign-spending-shows-political-ties-self-dealing">reporting</a> and <a href="http://www.propublica.org/special/a-tangled-web">interactive graphics</a>. ProPublica is able to show not just what campaigns are spending, but how much and with whom. (So far the biggest winner is Mentzer Media Services, an ad agency that specializes in GOP campaigns - including the Swift Boaters. Fech doesn't automatically point that out, of course, but it helps journalists uncover it.</p>
<p>Data without context is useless. By helping developers and journalists work with the filings in a more structured way, Fech helps newsrooms (or any other group) put the data in context to find the story behind the data. It's a long way from being <em>simple</em> to use, but it represents a significant improvement over the raw data. It's Apache-licensed, so it might find its way into all kinds of data analysis tools over time.</p>
<p>With Fech maturing well before the elections this fall, it could help all kinds of organizations follow the money trails much more efficiently. Here's hoping that happens.</p>
                    ]]></description>
                <link>http://readwrite.com/2012/05/07/improvements-in-new-york-times-fech-makes-it-easier-to-follow-the-money</link>
                <guid>http://readwrite.com/2012/05/07/improvements-in-new-york-times-fech-makes-it-easier-to-follow-the-money</guid>
                <category>Analysis</category>
                <pubDate>Mon, 07 May 2012 17:15:47 -0700</pubDate>
                <author>Joe Brockmeier</author>
            </item>
                    <item>
                <title><![CDATA[Do Personal Analytics Make Google Less Creepy?]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/lead-images/goodtoknow150.jpg" style="" />
			</span>
We're beginning to wake up to what free Web services do with our data. Google has been the main driver of this story in 2012, launching its <a href="http://www.readwriteweb.com/archives/google_issues_new_privacy_policy_for_one_unified_g.php">new privacy policy</a> that binds up each user's data for ad targeting across all Google services. People <a href="http://www.readwriteweb.com/archives/tech_world_overreacts_to_googles_new_privacy_polic.php">react viscerally</a> against this. But what's really in our data that's so valuable?</p>

<p>Unquestionably, there are abuses of user data that <a href="http://www.readwriteweb.com/archives/path_is_a_free_app_and_it_will_spy_on_us.php">go too far</a>. But the truly troubling stories have a halo effect. Early adopter culture is hardening against the idea of any kind of data collection about users. But cultural norms are always changing. Isn't it possible that there are some kinds of data collection that <a href="http://www.readwriteweb.com/archives/the_case_for_google.php">could be valuable to users</a>? </p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/googaccountpiechart.jpg" style="" />
			</span>
Google itself has begun trying to change the norms around this. It created a new opt-in <a href="http://www.readwriteweb.com/archives/google_now_offers_a_monthly_account_activity_repor.php">monthly account activity report</a> that provides Google users with some basic analytics about their Googling habits.</p>

<p>It's nothing earth-shaking, just stats about searches and email. But the point is, those kinds of insights are interesting. Google is trying to demonstrate that it can use its data-gathering powers for good.</p>

<p>Yesterday, Gmail engineer Saurabh Gupta <a href="http://gmailblog.blogspot.com/2012/04/know-your-gmail-stats-using-gmail-meter.html">blogged</a> about a tool called <a href="http://code.google.com/googleapps/appsscript/articles/gmail-stats.html">Gmail Meter</a>. It gives you much more detailed analytics about your Gmail, including inbound and outbound traffic over time, average response times, word counts, thread lengths and more.</p>

<iframe width="610" height="343" src="http://www.youtube.com/embed/ZooybMt9sRQ" frameborder="0" allowfullscreen></iframe>

<p>It's made by <a href="https://plus.google.com/u/0/116263732197316259248/about">Romain Vialard</a>, a top contributor to <a href="http://code.google.com/googleapps/appsscript/service_gmail.html">Google Apps Script</a>. He's not a Google employee; he's just a developer and a user motivated by the richness of all the data available from Gmail.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/googleprivacy.jpg" style="" />
			</span>
</p>

<p>Do efforts like this demonstrate the value of personal analytics, or do they just distract from important privacy concerns? According to the real <a href="https://geoloqi.com/blog/2012/03/data-portraits-powered-by-3-5-years-of-data-and-2-5-million-gps-points/">"quantified self"</a> enthusiasts, the key is owning the data.</p>

<p><big><strong>The Quantified Self</strong></big></p>

<p>Your average Google user isn't in the same position as someone like Stephen Wolfram, who has been <a href="http://www.readwriteweb.com/archives/5_things_i_learned_about_the_future_from_stephen_w.php">logging his every keystroke for decades</a> and using his own tools to analyze the data. He can record and analyze any kind of sensitive data he wants without worrying about crossing the privacy line.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/parecki_portland_small2.jpg" style="" />
			</span>
Other quantified-selfers like the creators of <a href="http://geoloqi.com">Geoloqi</a> have started to build demonstrations of the <a href="http://www.readwriteweb.com/archives/pronouncing_the_death_of_the_check-in.php">value that personal tracking can provide</a> as a service. Today's Web services create a fixed form for users to fill out, and that template stands in for who the users are. What if our profiles were made entirely of the data we collected about ourselves?</p>

<p>"You are the button," Geoloqi cofounder Amber Case says. That is, the fully quantified self has all the power. We know personal data is valuable because companies like Google all want a piece. But if we own vastly more data about ourselves than we're willing to give out, we have the advantage. Companies will have to compete on privacy in order to serve us.</p>

<p>Google's new effort to be the Web service provider to our online &#252;ber-self exposes millions of people to the power of personal data collection. But it's too risky for users to give Google the level of trust required to take this trend much further.</p>

<p>Fortunately, the <a href="http://www.readwriteweb.com/archives/pronouncing_the_death_of_the_check-in.php">use cases</a> demonstrated by Wolfram and Geoloqi are much more exciting than gazing at your search history. Hopefully, they'll catch on so fast that users wise up about this issue before we give all our data away.</p>

                    ]]></description>
                <link>http://readwrite.com/2012/04/20/do_personal_analytics_make_google_less_creepy</link>
                <guid>http://readwrite.com/2012/04/20/do_personal_analytics_make_google_less_creepy</guid>
                <category>Data Services</category>
                <pubDate>Fri, 20 Apr 2012 06:28:00 -0700</pubDate>
                <author>Jon Mitchell</author>
            </item>
                    <item>
                <title><![CDATA[The Benefits and Pitfalls of ESPN's new Developer Center]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/espn_dev_center_150.jpg" style="" />
			</span>
What do you do when you have a treasure trove of valuable data that developers would love to get their hands on? Release an API and let them create applications for you. That is precisely what sports network ESPN did today by <a href="http://developer.espn.com/">announcing its Developer Center</a> replete with multiple APIs for programmers. Developers can tap into ESPN's reservoir of data on athletes, teams, media, stats and research to create sports apps with rich data for fans across the world.</p>

<p>This is not a classic free API platform though. ESPN is owned by ABC, which is owned by Disney. Disney is not known for sharing nicely. Developers can use ESPN's headlines API for free but otherwise have to form a brand partnership with the sports giant. Yet, considering ESPN's prowess and well of rich data, that may not be a bad thing. </p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/espn_dev_center.jpg" style="" />
			</span>
<h2>ESPN's APIs</h2></p>

<p>There are six APIs in ESPN's new developer center. The only one that does not require a premium partnership with the network is the headlines API that allows developers to grab content from the company covering sports and athletes. The other five can only be used by ESPN "premium partners" or the network itself.</p>

<p>There are a lot of things to like in those five APIs. One of the great strengths of ESPN is the ability to track sports information in real time and store it in a database. For instance, if you have ever followed a sports game from your mobile device or computer, ESPN's "GameCenter" application is one of the marvels of modern sports reporting. Individual leagues track their own games such as similar services from the MLB, NHL, NFL and NBA but the difference between other leagues and ESPN is that the network tracks every game, everywhere. Looking up to the second updates on yesterday's Creighton vs. Illinois State game? GameCenter has it in almost real-time. The other large sports networks can do this as well (such as Yahoo and CBS Sports) but when it comes to data and developers, ESPN is winning the race. </p>

<p>Here are the benefits of the six APIs with descriptions from ESPN:</p>

<blockquote><ul>
	<li><strong>Athletes</strong>: "Allows you to get rosters of players for various sports, as well as biographical and statistical data for individual athletes." </li>
	<li><strong>Research Notes</strong>: "Allows you to tap into ESPN's vast knowledgebase of exclusive sports data tidbits compiled by our Stats and Information Group. Research Notes are available by sport, athlete, team, and even game."</li>
	<li><strong>Standings</strong>: "Enables you to get the latest standings for a particular sport by division, conference, or overall. Data is also available by year and by season type (preseason, regular season, playoffs)."</li>
	<li><strong>Headlines</strong>: "Allows you to interact with ESPN's various news stories. ESPN publishes hundreds of unique pieces of text content each day, covering dozens of sports and hundreds of athletes and teams."</li>
	<li><strong>Scores & Schedules</strong>: "Provides game/match information, including start times, venue, competitors, score, and stats across every major sport."</li>
	<li><strong>Teams</strong>: "Enables you to get information, including roster, stats, and more, for individual teams. You can also fetch teams by conference or division."</li>
</ul></blockquote>

<h2>What can be Built?</h2>

<p>ESPN showcases a couple stalwart applications that use its APIs such as Flipboard and Pulse. It also shows the Brigham Young Cougars app that leverages the scores API to keep track of all things BYU. Foursquare uses the schedule and research APIs to check into sporting events. There are so many more useful functions for these APIs than just news readers and specific teams.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/espn_flipboard.jpg" style="" />
			</span>
</p>

<p>For instance, fantasy sports gurus could tap into ESPN's data to provide real-time insight and analysis to their apps. Trivia apps can get almost any sports answer in a matter of seconds. Niche sports sites can provide rich data on matchups and schedules and locations. For instance, the Colonial Athletic Association (CAA) is having its basketball championship tonight. Tapping into ESPN data can provide readers better context into a league that is otherwise overlooked by those not associated with it. </p>

<p>ESPN also has an advanced statistics group that crunches numbers and gives the biggest stat heads all the information and context they can consume. Being a baseball fan that analyzes advanced stats, it would be helpful to me if a developer took the raw data out of ESPN and created visualizations and dynamic charts. There are other sources on the Web where this information is available but to my knowledge none of them release it as an easy to use API. </p>

<h2>Working With ESPN</h2>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/byu_espn.jpg" style="" />
			</span>
This is where it gets tricky. ESPN wants you to have this data ... it just does not want you to profit from it. That is reasonable considering it is ESPN's data. Opening up this data as a platform is a way for the company to brand itself across multiple applications. It is important to <a href="http://developer.espn.com/terms">read the fine print.</a></p>

<p>Developers cannot advertise in an application that uses ESPN APIs except through specific network approved ads. Presumably ESPN would split revenues with developers from those apps but the terms of service make no specific mention of sharing. </p>

<blockquote>"ESPN may make available to you a Tool or other API that permits you to include ESPN-approved advertising in your Apps. Unless otherwise set forth in the Information Form or on a separate "Advertising Addendum," you may not include any advertising or sponsorship in your App (unless included in the Content made available by ESPN)."</blockquote>

<p>ESPN is not Facebook. When Facebook opened its platform in 2007 is was a revelation on how a platform can open itself to be built upon and let everybody grow and prosper. Facebook created the blueprint for what it takes to create a successful developer program and platform. As for as open platforms go, ESPN's developer program is one of the most closed systems of rich data that we have seen. Essentially the network is saying, "here is out data but play by our rules and make sure we get all the credit for everything you do." While the API may be an enlightened idea from a sports network, its terms are not. </p>
                    ]]></description>
                <link>http://readwrite.com/2012/03/05/the_benefits_and_pitfalls_of_espns_new_developer_c</link>
                <guid>http://readwrite.com/2012/03/05/the_benefits_and_pitfalls_of_espns_new_developer_c</guid>
                <category>Data Services</category>
                <pubDate>Mon, 05 Mar 2012 01:00:00 -0800</pubDate>
                <author>Dan Rowinski</author>
            </item>
                    <item>
                <title><![CDATA[How We're Going to Fix Online Identity and Reputation]]></title>
                <description><![CDATA[
                                        <img src="http://readwrite.com/files/styles/800_450sc/public/files/files/images/hypothesisworkshop1.jpg" />
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/hypothesisworkshop1.jpg" style="" />
			</span>
Last week, I had the great fortune of attending the <a href="http://hypothes.is/repworkshop.html">Hypothes.is Reputation Workshop</a>. Hypothes.is aims to build nothing less than a <a href="http://www.readwriteweb.com/archives/hypothesis_a_peer-review_layer_for_the_internet.php">peer review layer for the whole Internet</a>. It's a mind-boggling idea when you let it sink in. The technical challenges are formidable, and the cultural ones are even bigger. Nevertheless, the excitement around the project is intense and contagious.</p>

<p>It's a project that has drawn in the likes of <a href="https://en.wikipedia.org/wiki/User:DarTar/Hypothes.is_Reputation_Workshop#Wikipedia_as_a_proof-of-concept_use_case">Wikimedia</a>, the <a href="http://www.archive.org/">Internet Archive</a> and the <a href="http://eff.org">Electronic Frontier Foundation</a>. The stewards of the free Web want this problem solved. To get the ball rolling on figuring this out, <a href="http://hypothes.is/repworkshop.html">Hypothes.is</a> invited a colorful panel of experts - and... me. - to a three-day think tank on San Francisco Bay to identify the challenges, parse them out, and prototype solutions. And guess what? We pulled it off.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/hypothesisworkshop2.jpg" style="" />
			</span>
</p>

<p><big><strong>Why We Need Hypothes.is</strong></big></p>

<p>There are two fundamental, related cultural problems with the Web that Hypothes.is wants to address: <a href="http://www.readwriteweb.com/archives/a_proposal_to_fix_online_identity.php"><strong>identity</strong></a> and <strong>reputation</strong>. Reputation is the main problem, but you can't approach it without fixing identity. A reputation doesn't refer to anything without a consistent identity behind it.</p>

<p>The goal is to build a system of reputation for, ideally, all the content on the Web. No filter exists today for us to assess what information is <em>good</em> and what information is <em>bad</em>.</p>

<p>Right now, the Web is vulnerable to gaming. Google's search ranking algorithm is a fine piece of early reputation technology, and <a href="http://www.readwriteweb.com/archives/interview_changing_engines_mid-flight_qa_with_goog.php">it's constantly improving</a>, but it's exploitable by playing with back-links and keywords.</p>

<p>And now that the social filters of Facebook, Twitter and Google+ have begun to dominate the time of Web users, the Web runs the risk of becoming an out-and-out popularity contest. If we could filter the Web by reputation, we could turn it into a meritocracy.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/hypothesisworkshop3.jpg" style="" />
			</span>
</p>

<p><big><strong>Identity &amp; Reputation</strong></big></p>

<p>Today, identity and reputation are fragmented by different, often competing online services. We have Facebook identities, Google identities and email identities. We have OpenIDs, but we might have a bunch of those. Some of these identities may be linked, but the links are weak.</p>

<p>In each of these networks, we also have reputations, however basic they may be. Our contributions are liked and +1d by friends and ranked by an algorithm. These services might provide APIs to be interoperable with other sites and applications, but that just <a href="http://www.readwriteweb.com/archives/a_proposal_to_fix_online_identity.php">extends the domain of the dominant platform</a>, Facebook or Google. It's not a multi-faceted identity. It's the same monolithic identity extended everywhere.</p>

<p>Plus, identity providers like Facebook and Google have interests that run counter to ours. <a href="http://www.readwriteweb.com/archives/4chans_chris_poole_facebook_google_are_doing_it_wr.php">Real people are multi-faceted</a>. We want to be able to express different aspects of ourselves in different contexts. But Facebook, Google et al have built businesses upon <a href="http://www.readwriteweb.com/archives/4chans_chris_poole_facebook_google_are_doing_it_wr.php">consistent, unchanging, public identities</a> for all of us, despite <a href="http://www.readwriteweb.com/archives/google_plus_tells_pseudonym_lovers_to_shove_it.php">nasty, sometimes dangerous consequences</a>.</p>

<p>People with enough privileges don't have to worry about their public identities and reputations, but marginalized or vulnerable people around the world face real danger for speaking out online. They still need the ability to participate fully. That's why a truly Web-wide reputation system cannot be subject to any company's "real names policy."</p>

<p>We learned in the workshop that the best kind of online identity is one that is <a href="https://en.wikipedia.org/wiki/User:DarTar/Hypothes.is_Reputation_Workshop#Identity_management">pseudonymous but expensive</a>. It's easy to get one pseudonym, but it's very difficult to change or create new ones. A pseudonym could also be privately verified with a government-issued ID or some other standard, so the user remains pseudonymous to the world, but the reputation system knows who it is.</p>

<p>It could also plug into existing identity services. There's no reason this identity couldn't be connected with Google or Facebook if the user so desired. Those are existing, thriving communities, and they'll need annotation, too. They just can't be the <em>sole provider</em> of identity.</p>

<p>Hypothes.is needs relatable identities to build a reputation system. It will have its own reputation algorithms, mechanisms and <a href="https://en.wikipedia.org/wiki/User:DarTar/Hypothes.is_Reputation_Workshop#Moderation_strategies">moderation strategies</a>. It will implement its trusted users' contributions as a layer of reputation that can apply to all the content on the Web. Hypothes.is users will be like <a href="https://en.wikipedia.org/wiki/User:DarTar/Hypothes.is_Reputation_Workshop#Wikipedia_as_a_proof-of-concept_use_case">Wikipedia editors, but for everything</a>.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/hypothesisworkshop4.jpg" style="" />
			</span>
</p>

<p><big><strong>What Might Hypothes.is Look Like?</strong></big></p>

<p>Hypothes.is wants to build a layer of <a href="https://en.wikipedia.org/wiki/User:DarTar/Hypothes.is_Reputation_Workshop#Annotation_and_versioning"><strong>annotations</strong></a> attached to a system of reputation. In order to do that, it also needs to create a community around the right set of tools.</p>

<p>In the workshop, we imagined a few different interfaces. One crucial interface decision will be the way annotations are displayed on top of the content as a user browses the Web. It could be "heat maps," where areas of the document with lots of annotation are color-coded to indicate activity, quality or both. It could be a sidebar full of various annotations, even multimedia ones.</p>

<p>Once it's in place, a user signed in to Hypothes.is will be able to judge a document's reputation on sight or even filter a long list of documents. Participating sites will be able to stand behind their visible reputations. The Web will be less sketchy and seedy in places where clarity and transparency are needed.</p>

<p>And it won't be imposed by some central authority, but rather by the work of dedicated annotators from all around the Web. Call them journalists, call them editors, call them curators, call them whatever you want. The advantages of this system over one based on blogging, hype and personality should be obvious. We could have a standard for assessing the quality of Web content, and that will help us assign real value to it.</p>

<p>If you're curious about the particular, potential designs we discussed, you should visit <a href="https://en.wikipedia.org/wiki/User:DarTar/Hypothes.is_Reputation_Workshop#Identity_management">Dario Taraborelli's amazing notes on Wikipedia</a>. He goes into depth about the possible solutions we considered. There's no product news to report yet. I'm just here to update you on the state of the conversation. It's thriving, it's exciting, and it's necessary.</p>

<p>If you want to follow along, visit <a href="http://hypothes.is">Hypothes.is</a> and follow <a href="https://twitter.com/#!/hypothes_is">@hypothes_is</a> on Twitter. As Hypothes.is makes news, we'll keep bringing it to you here on ReadWriteWeb.</p>

<p>And if you have ideas you want to contribute to this project, the Hypothes.is team has set up an email address for you to submit them: <strong>ideas {at} hypothes.is</strong></p>

<p><em>Photos by Lisa Heft courtesy of <a href="http://hypothes.is">Hypothes.is</a></em></p>

<p><em>Disclosure: My really good friend <a href="https://twitter.com/#!/tilgovi">Randall Leeds</a> is building Hypothes.is as the technical co-founder. This happened</em> after <em>RWW <a href="http://www.readwriteweb.com/archives/hypothesis_a_peer-review_layer_for_the_internet.php">started covering</a> founder Dan Whaley's efforts to create an annotation system for the whole Web. It's a complete coincidence. Still, I admit that this may have an impact on my reporting. But hey, at least my fund isn't investing in it.</em></p>

                    ]]></description>
                <link>http://readwrite.com/2012/03/02/hypothesis</link>
                <guid>http://readwrite.com/2012/03/02/hypothesis</guid>
                <category>Data Services</category>
                <pubDate>Fri, 02 Mar 2012 03:34:00 -0800</pubDate>
                <author>Jon Mitchell</author>
            </item>
                    <item>
                <title><![CDATA[Delicious Founder Creates New People Search Engine, Skills.to]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/tastylabslogo.jpg" style="" />
			</span>
Joshua Schachter and his team of star developers at TastyLabs have begun work on a second project, an endorsement and people search engine called <a href="http://Skills.to">Skills.to</a>.  The site lets you endorse people for their skills in various fields, see what the people you know have been endorsed for and search for people with particular skills.  </p>

<p>The site is just beginning. "We have a lot to do, lots of ideas here and lots of places we can go next," Schachter told me by Twitter DM today.  What's the core idea behind the site? "Search engine for people by property of the person," he says. "Portable reputation someday."  There's certainly something refreshingly Delicious-like about it, the way you can navigate around the site by clicking any link and navigating by a few simple properties.</p>
<p>Things like this have been tried before, from <a href="http://wefollow.com/">WeFollow </a> to <a href="http://endor.se/">Endor.se</a> to other related efforts (disclosure: I may just be building <a rel="nofollow" href="http://plexusengine.com">something related</a> myself).  </p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/SkillsScreen.jpg" style="" />
			</span>
</p>

<p>The <a href="http://TastyLabs.com">TastyLabs</a> team, which is<a href="http://www.readwriteweb.com/archives/from_the_creator_of_delicious_jig.php"> full of rock-stars beyond just Schachter</a>, first built a social-help site called <a href="http://Jig.com">Jig</a> last Summer.  That site works well and is fun to use, but it's not clear how much traction it's seen yet.  That service <a href="http://blog.jig.com/2012/02/10/announcing-the-jig-iphone-app/">launched an iPhone app</a> earlier this month, a welcome move since Jig is particularly conducive to mobile use.</p>

<p>Schachter is best-known for building archetypal social bookmarking site Delicious, which he sold to Yahoo who didn't know how to love it.  The site has since been sold again to a team led by the founders of YouTube, who may be even worse still at loving it.  Delicious offered something simple on the surface - the ability to save links you wanted to read later - but surfaced far more interesting information <a href="http://www.readwriteweb.com/archives/rip_delicious_you_were_so_beautiful_to_me.php">when analyzed in aggregate</a>.</p>

<p>That potential was never really realized but it's the same kind of thinking behind Jig, and I presume behind Skills.to.  These are services that offer a clear and simple value proposition to the end user, but that can offer even more derivative value once patterns of use are analyzed and used as a platform to reform the user experience.</p>

<p>Lots of people have tried to create a discovery-through-endorsement website, but I'd be willing to bet that the TastyLabs team is going to bring some extra special insight and creativity to this seemingly simple space.  </p>

<p>The portable identity angle that Schachter mentions could be the first example of that dynamic: imagine taking your Skills.to endorsements with you to sites around the web.  That could prove useful in all kinds of circumstances - from establishing credibility to targeting content to powering recommended social and content connections.</p>

<p><em>Disclosure #2: Upon announcing internally that I was going to write about this, RWW Community Manager Robyn Tippins also disclosed that she has done some marketing consulting for TastyLabs.  Lucky them, their team of smart people goes on and on.</em></p>
                    ]]></description>
                <link>http://readwrite.com/2012/02/29/delicious_founder_creates_new_people_search_engine</link>
                <guid>http://readwrite.com/2012/02/29/delicious_founder_creates_new_people_search_engine</guid>
                <category>Data Services</category>
                <pubDate>Wed, 29 Feb 2012 10:09:38 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[Can OpenGeocoder Fill the Platform Gap Left by Google Maps?]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/OpenGeocoderlogo.jpg" style="" />
			</span>
How do machines understand what place you're talking about when you say the name of a city, a street or a neighborhood?  With geocoding technology, that's how.    Every location-based service available uses a geocoder to translate the name of a place into a location on a map.  But there isn't a really good, big, stable, public domain geocoder available on the market.</p>

<p>Steve Coast, the man who lead the creation of <a href="http://openstreetmap.org">Open Street Map</a>, has launched a new project to create what he believes is just what the world of location-based services needs in order to grow to meet its potential.  It's called <a href="http://www.opengeocoder.net/">OpenGeocoder</a> and it's not like other systems that translate and normalize data.  </p>
<p>Google Maps says you can only use its geocoder to display data on maps but sometimes developers want to use geo data for other purposes, like content filtering.  Yahoo has great geocoding technology but no one trusts it will be around for long.  Open Street Map (OSM) is under a particular Creative Commons license and "exists for the ideological minority," says Coast himself in a Tweet this week. And so Coast, who now works at Microsoft, has decided to solve the problem himself. </p>

<p>This has been tried before, see for example <a href="http://highearthorbit.com/geocommons-open-sourced-geocoder/">GeoCommons</a>, but the OpenGeocoder approach is different. It is, as one geo hacker put it, "either madness or genius."</p>

<p>The way OpenGeocoder works is that users can search for any place they like, by any name they like.  If the site knows where that place is, it will be shown on a big Bing map.  If it doesn't, then the user is encouraged to draw that place on the map themselves and save it to the global database being built by OpenGeocoder.  </p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/OpenGeocoderpic.jpg" style="" />
			</span>
<br />
<em>Above: The river of my childhood, which I just added to the map.</em></p>

<p>Every single different way a place can be described must be drawn on the map or added as a synonym, before OpenGeocoder will understand what that string of letters and numbers means with reference to place.  Anyone can redraw a place on the map, too.</p>

<p>Then developers of location-based services can hit a JSON API or download a dump of all the place names and locations for use in understanding place searches in their own apps. It appears that just under 1,000 places have been added so far.  It will take a serious barn-raising to build out a map of the world this way.  It wouldn't be the first time something a little like this has been done before though.</p>

<p>"If only it was that simple :(" said map-loving investor <a href="http://about.me/stevenfeldman">Steven Feldman</a> on Twitter. "Maybe it is?"</p>

<p>The approach is focused largely on simplicity.  Coast said in <a href+"http://stevecoast.com/2012/02/22/opengeocoder/">his blog post</a> announcing the project:<br />
<blockquote>"OpenGeocoder starts with a blank database. Any geocodes that fail are saved so that anybody can fix them. Dumps of the data are available.</p>

<p>"There is much to add. Behind the scenes any data changes are wikified but not all of that functionality is exposed. It lacks the ability to point out which strings are not geocodable (things like "a") and much more. But it's a decent start at what a modern, crowd-sourced, geocoder might look like."</blockquote></p>

<p>Testing the site, I grew frustrated quickly.  I searched for the neighborhood I live in: Cully in Portland, Oregon.  There was no entry for it, so I added one.  But there are no street names on the map so I got lost.  I had to open a Google Map in the next tab and switch back and forth between them in order to find my neighborhood on the OpenGeocoder map.  Then, the neighborhood isn't a perfect rectangle, so drawing the bounding box felt frustratingly inexact.  I did it anyway, saved, then tried recalling my search.  I found that Cully,Portland,Oregon (without spaces) was undefined, even though I'd just defined Cully, Portland, Oregon with spaces.  I pulled up the defined area, then searched for the undefined string, then hit the save button, and the bounding box snapped back to the default size, requiring me to redraw it again, on a map with no street names.  Later, I learned how to find the synonym adding tool to solve that problem.</p>

<p>In other words, the user experience is a challenge.  That's the case with Wikipedia too, and OpenGeocoder just launched, but I expect it will need some meaningful UX tweaks before it can get a lot of traction.</p>

<p>I hope it does.</p>

<p>That's just my experience so far, though.  Not everyone feels that way.  GIS geek <a href="https://twitter.com/#!/huitheure">Paul Wither</a> calls it "addictive."  </p>

<p>There are certainly high hopes for the project, too.</p>

<p>"I'm <a href="http://petewarden.typepad.com/searchbrowser/2011/10/what-can-you-use-for-geocoding-instead-of-google-maps.html">obsessed with the need for an open-source geocoder</a>, and this is a fascinating take on the problem," says data hacker <a href="http://petewarden.typepad.com">Pete Warden</a> about OpenGeocoder. "By doing a simple string match, rather than trying to decompose and normalize the words, a lot of the complexity is removed. This is either madness or genius, but I'm hoping the latter. The tradeoff will be completely worthwhile if it makes it more likely that people will contribute."</p>

<p>Coast will certainly be able to gather the attention of the geo community for the project.  As we wrote <a href="http://www.readwriteweb.com/archives/steve_coast_joins_bing.php">when he joined the Bing team 18 months ago</a>:<br />
<blockquote>Coast is a giant figure in the mapping world. In 2009, readers of leading geo publication Directions Magazine voted him the 2nd most influential person in the geospatial world, ahead of the Google Maps leadership and behind only Jack Dangermond, the dynamic founder of 41-year old $2 billion GIS company ESRI. Coast will turn 30 years old next month.</blockquote></p>

<p>The more I play with OpenGeocoder, the more it grows on me.  I hope Coast and others are able to put in the time it will take to make it as great as it could be.</p>
                    ]]></description>
                <link>http://readwrite.com/2012/02/26/can_opengeocoder_fill_the_platform_gap_left_by_goo</link>
                <guid>http://readwrite.com/2012/02/26/can_opengeocoder_fill_the_platform_gap_left_by_goo</guid>
                <category>Data Services</category>
                <pubDate>Sun, 26 Feb 2012 06:32:59 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[How Two Startups Use Games to Beat the Developer Crunch]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/Basketball.jpg" style="" />
			</span>
<strong><em>"You can't judge if someone is one of the best programmers in the country in 1 minute, but it turns out you can in 5 minutes."</em></strong></p>

<p>Good software developers are hard to find.  Startups are all about finding creative solutions to common problems - so why not this one too?  </p>

<p>Two startups that have found creative and interesting ways to solve their developer shortage problems are travel photo network <a href="http://Jetpac.com">Jetpac</a> and app search startup <a href="http://Quixey.com">Quixey</a>.  Both used contests and games to overcome their challenges and get access to the high-level coding talent they needed.  Their efforts may illustrate a part of what people call the gamification of work that's expected to be a big part of the future. </p>
<h2>How Jetpac Built a Photo Quality Algorithm for $5k in 3 Weeks</h2>

<p><a href="http://jetpac.com">Jetpac</a> is a young San Francisco startup that asks you to log in with your Facebook account, then it searches through all the photos your friends have uploaded.  It looks for photos with the names of places in their captions, then builds a personalized travel photo magazine out of your friends' pictures.  </p>

<p>One member of the founding team is leading data hacker <a href="http://petewarden.typepad.com/searchbrowser/2012/02/why-facebooks-data-will-change-our-world.html">Pete Warden</a>.  (Disclosure, Warden told me this story while I was staying at his house on a trip to SF, but it's such a cool story I've been telling it ever since - and it works well with the Quixey story too.)  </p>

<p>Warden says that when the team was first showing off its service in demos, far too many of the photos that came up were terrible.  They were blurry, boring, bad photos.  It was easy for a human being to look at these photos and know they should be excluded from the collections displayed.  </p>

<p>Could a machine be taught to look at new photos and determine whether they were high or low quality?  Warden suspected that it was possible, but recognized the limitations of his own knowledge.   He didn't have the machine learning skills to build something himself, much less at the pace the company needed a solution.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/jetpacscreen.jpg" style="" />
			</span>
</p>

<p>Here's what they did: They looked at 30,000 photos with their human brains and quickly judged whether each was a good or bad photo for a travel magazine experience.  </p>

<p>Then they visited the website <a href="http://Kaggle.com">Kaggle</a>, where data science challenges gets turned into contests with prizes that anyone in the world can win.  The Jetpac team took all the metadata they had about these 30,000 photos, including the dimensions, and they substituted standardized numbers for words that appeared more than once.  They uploaded all that data onto Kaggle but they only included the corresponding human judgement of whether a photo was good or bad for 10,000 of the photos.</p>

<p>The challenge they set up was this:  could Kaggle participants write code that could analyze the patterns of metadata effectively enough based on the 10k photos they were told the human judgements about well enough to accurately guess whether humans would call the other 20k photos good or bad just based on the other metadata available about them?  </p>

<p>The startup put up a tiny $5k bounty, one of the smallest Kaggle had ever hosted, and applied a deadline in 3 weeks.</p>

<p>People loved it.  All kinds of computer scientists moonlighting as Kaggle competitors jumped into the fray and wrote algorithms they thought could predict photo quality.  They drafted something up, uploaded their "guesses" for the other 20k photos to Kaggle's server, then were told what percentage they got right - how often they accurately predicted a person would deem a photo good. Then they changed their code and tried to improve their results.</p>

<p>212 teams, consisting of 418 people, competed for 3 weeks.  The contest leaderboard showed the top ten teams all had more than 85% accuracy.  </p>

<p>All the algorithms found that there were some words in photo captions that make them far more likely to be connected to a good travel photo than a bad one.  Among the best words: Peru, Cambodia, Michigan, tombs, trails and boats.  What photo captions are the most likely to signify a bad photo for a travel magazine?  San Jose, mommy, graduation and CEO, Warden says.</p>

<div class="pullquote">All the algorithms found that there were some words in photo captions that make them far more likely to be connected to a good travel photo than a bad one.  

<p>Among the best words: Peru, Cambodia, Michigan, tombs, trails and boats.  What photo captions are the most likely to signify a bad photo for a travel magazine?  San Jose, mommy, graduation and CEO, Warden says.</div><br />
Bo Yang, a USC PhD whose team had just narrowly lost out on winning the Netflix prize, squeaked out a small improvement in his photo quality algorithm to take the top prize in the very last day.  Yang was interviewed by the Kaggle team <a href="http://blog.kaggle.com/2011/11/23/picture-perfect-bo-yang-on-winning-the-photo-quality-prediction-competition/">here</a>.</p>

<p>Part of the Kaggle terms of service are that contest sponsors must have non-exclusive IP rights to the work, so the Jetpac team was able to put code from the contestants directly into their app.  </p>

<p>Jetpac's Warden says of the experience as a startup,<br />
<blockquote>"The two biggest enemies of a startup are lack of money and lack of time.  Packaging the data didn't take as long as we thought and after we uploaded it to the site, all of the details of dealing with the contestants were automated.  So it saved us a massive amount of time compared to finding, hiring and explaining our problem to an outside contractor.</p>

<p>"And we would never have gotten anywhere near the quality from the circle of people we know.  The short term nature of the project wouldn't have made it attractive as a project for most - just the overhead of setting up a contract and that sort of stuff.  The caliber of people participating in these contests is amazing.  They aren't starving college students, many are highly skilled professionals who make a lot more money than I do, in their day jobs.  They do this for fun."</blockquote></p>

<p>Jetpac had to think through how to set up the contest, but the Kaggle team helped them, too.  It's hard to imagine a way that such a complex problem could get so much brain power thrown at it so fast and so inexpensively.  Warden says the end results have been great.</p>

<h2>How Quixey Finds Great Developers with $100, 60 Second Challenges</h2>

<p><a href="http://Quixey.com">Quixey</a>, a Silicon Valley app search engine (it's cool, try it - I <a href="http://www.quixey.com/app/50390602/nutshell-summarizer">found this on it</a>), faces the same struggle to find developers that so many startups do.  They have high-profile VC backing (Eric Schmidt of Google, among others) and had been paying $20k per developer hire to traditional recruiters.</p>

<p>Liron Shapira, co-founder and CTO of Quixey, says the company came up with a very elegant solution.  Called the Quixey Challenge, it's a simple contest.  If you can find and fix a bug in the code for an algorithm you're given, in under 60 seconds, the company PayPals you $100.  </p>

<p>In order to qualify for the monthly contest, you've got to succeed at least 3 times in challenge rounds over the weeks prior to the big event.  If you qualify, then the company calls you on Skype and administers the challenge face-to-face.  It only lasts 60 seconds.  If, in preparation, you succeed 5 times - then the system automatically contacts you to see if you might be interested in working for Quixey.</p>

<p>Shapira says that 38 prizes were awarded in the December challenge, and it resulted in 3 full time hires and 2 intern hires.  Winners also receive Quixey Challenge hoodies, which Shapira says can be seen floating around the elite student body of Carnegie Mellon University.</p>

<p>"We've had about 5k users sign up and practice and we've reached out to 500 or something," Shapira told me.  "Those are incredibly valuable leads to have."<br />
<blockquote>"We just hired a guy named Marshall who doesn't have a college degree and lives in Grand Rapids Michigan.  He wouldn't come in from a Silicon Valley recruiter, but he reads Hacker News and he nailed the interview.  </p>

<p>"You can't judge if someone is one of the best programmers in the country in 1 minute, but it turns out you can in 5 minutes.  You only need 3 practices to qualify for the challenge but people take 10.  A low percentage like 1 in 15 or 20 users will be good enough to get contacted, so we are able to filter people out with high accuracy.</p>

<p>"We wasted so much time figuring out peoples' skills before. Many times we'll do the challenge or interviews and it will take 15 minutes.  The fact that some people can do it under 1 minute and others, also working in Silicon Valley, take 15 minutes, is evidence of the <a href="http://www.quora.com/10X-Engineers">10x engineer idea</a>.  Debugging is something you do every day at work, if you can get more than half of the bug fixes that we put in front of you, fast, then you are probably very good and we want to talk you."</blockquote></p>

<p>Quixey says it is looking to outsource its process to other startups sometime in the future.</p>

<p><br />
<div class="pullquote">See also: <a href="http://www.readwriteweb.com/cloud/2012/02/the-end-of-the-resume-oracles.php">The End of the Résumé: Oracle's Big Plans for Taleo</a></div>It's not just these two companies that are using contests and games to get software development done.  The US Government has <a href="http://challenge.gov/">Challenge.gov</a> and <a href="http://seatgeek.com/blog/hiring/hiring-challenges-shouldnt-be-limited-to-developers">SeatGeek has gamified</a> not just their developer hiring but also their communications hires.  Then there's the <a href="http://www.newcommbiz.com/could-gamification-replace-management/">gamification of everyday employee management</a>.</p>

<p>Examples are just beginning to emerge, but they do seem to point towards some relief in the face of a very difficult talent shortage challenge.  "We think we're still on the leading edge of this trend and it's going to get bigger," says Quixey's Liron Shapira.</p>

<p>"We didn't know if people would be able to produce a good result out of this," says Jetpac's Warden about his startup's experiment with gamification of development, "but we were amazed by how effective the solutions they come with were."</p>

<p>"The most important choice you make as a data scientist is deciding what problems you're not going to solve," Warden says.  That equation changes when you've got access to compelling ways to use other peoples' skills to solve those problems.</p>

<p><em>Basketball hoop photo from MinimalistPhotography101.com</em></p>
                    ]]></description>
                <link>http://readwrite.com/2012/02/25/how_two_startups_used_games_to_beat_the_developer</link>
                <guid>http://readwrite.com/2012/02/25/how_two_startups_used_games_to_beat_the_developer</guid>
                <category>Analysis</category>
                <pubDate>Sat, 25 Feb 2012 08:27:17 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[Pixar Engineers Leave to Build Real World Living Toys]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/ToyTalklogo.jpg" style="" />
			</span>
<strong>Teddy Ruxpin, meet Siri.</strong></p>

<p>Imagine a children's toy designed by the people behind the <em>Toy Story</em> and <em>Finding Nemo</em> movies but connected to the web and chock full of artificial intelligence.  Then add in visual tracking, speech recognition and massive network scalability.  It appears that's what San Francisco startup <a href="http://toytalk.com">ToyTalk</a> is building, based on conversations and information available online. </p>

<p>The company is putting together a powerful team of technologists and creatives from Pixar and SRI (makers of Siri) and is being relatively open about what it's up to. But it has received no press coverage anywhere as far as I can tell.  That's going to change once word gets out about who they are and what they're doing.  The possibilities in both entertainment and education are amazing.</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/OrenJacob.jpg" style="" />
			</span>
<br />
<em>ToyTalk CEO Oren Jacob, photo by <a href="http://chuckfoxen.tumblr.com/">Chuck Foxen</a>.</em></p>

<p>Neilsen <a href="http://blog.nielsen.com/nielsenwire/online_mobile/american-families-see-tablets-as-playmate-teacher-and-babysitter/">announced new numbers yesterday</a>,  showing that tablet computers are increasingly being used by children.  70% of US households with both tablets and children under 12 now report that their children use the family tablet computer, up 9% over Q3 of last year.</p>

<p>Imagine the youngest of children using Web-connected toys carrying character-driven chatterbot artificial intelligence programs.  If done well, the possibilities for child development, education, language learning and more are awe inspiring to consider.   What are the problems that need to be solved?  Lovability, connectivity and sufficiently intelligent interactivity.  It's that last one that seems the hardest, the least solved.  Perhaps if ToyTalk can pull it off, the company can resolve one of the world's most damaging resource shortages, the shortage of engaging time and energy for childhood development.</p>

<h2>The Brains</h2>

<p>ToyTalk CEO Oren Jacob worked at Pixar for 20 years, where he served as Chief Technology Officer.  Then he was the Entreprenuer-in-Residence at August Capital.  Now he's assembling a company that includes other ex-Pixar people, a heavy-duty engineer from Internet mega-pipe Akamai and a computer scientist from SRI, the research firm that created the now Apple-owned mobile personal assistant Siri. (Jacob once made a documentary film about competitive grocery bagging; more on him <a href="http://www.pixartalk.com/pixarians/oren-jacob/">here</a>.)</p>

<p>ToyTalk's Creative Director Bobby Podesta worked on Pixar movies like <em>A Bug's Life, Toy Story 2, Monsters Inc., Finding Nemo, The Incredibles</em> and was a Directing Animator on <em>Cars</em>.  Podesta is hiring a mobile UI developer and a creative writer who can build out charecter dialogue.</p>

<p>Martin Reddy, the ToyTalk CTO, is a Computer Science PhD with more than 40 published papers and 5 years of experience building geospatial visualization technology at the Artificial Intelligence Center at SRI International, the organization that built Siri.</p>

<p>Now imagine massive data input and output from these toys.  James Chalfant, ToyTalk's Director of Scalability, helped build Akamai, a massive Content Delivery Network that serves up 30% of all the web-based content consumed in the world.</p>

<p>Michael Chann built Pixar's animation technology and is now a visual tracking software specialist at ToyTalk.  Brian Langner is a Carnegie Mellon PhD and now ToyTalk's "Senior Speech Scientist" specializing in human computer spoken word interaction.  Byrne Reese was the Product Manager for Movable Type, one of the world's first major blogging platforms and is now Head of Customer Development at ToyTalk.  Renee Adams, head of operations at ToyTalk, spent years working on logistics and retail operations at Apple.</p>

<p>Got that?  We're talking about children's toys built by an AI scientist from where Siri was born, that tracks human movement, can interact with spoken words, is connected to the web and mobile by an engineer with a world-beating scalability background, promoted by an early advocate of blog publishing software that changed the world and designed by people behind the most popular children's movies in history. </p>

<p>That sounds incredible.  And maybe a little bit frightening.</p>

<p>Could these be the toys that teach your children multiple languages, that help provide some interactivity to neglected children, that save the next generation from passive consumption of non-interactive broadcast media?  </p>

<p>Or will they fall into the Uncanny Valley, seem creepy to adults but desensitize children to the true humanity of living people, ushering in a generation of humans so comfortable with robots that the robots proliferate and ultimately... Well, you can imagine.  Perhaps it's <em>post-humanity</em> that will feel like Elvis's swinging hips for our generation, so wrong to us but a much-loved part of the future for our children.</p>

<p>Those are the questions I'll be asking when more information comes out about ToyTalk.  The company hasn't yet responded to my request for an interview.  I hope they will soon.</p>
                    ]]></description>
                <link>http://readwrite.com/2012/02/17/ex-pixar_geeks_building_siri-style_line_of_toys</link>
                <guid>http://readwrite.com/2012/02/17/ex-pixar_geeks_building_siri-style_line_of_toys</guid>
                <category>Data Services</category>
                <pubDate>Fri, 17 Feb 2012 05:34:19 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[Data Privacy: What Bill Gates Said 10 Years Ago]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/DataPrivacyDayLogo.jpg" style="" />
			</span>
Today is <a href="http://www.staysafeonline.org/dpd/about">International Data Privacy Day</a>, an event backed by companies like Intel, Ebay, Facebook and Microsoft, and dedicated to educating data owners about best practices in protecting the privacy of consumer data.</p>

<p>The need to keep people from being exploited on account of violations of their privacy is clear, well-known, intuitive and amply articulated by highly capable people.  The up-side of <em>making use of</em> peoples' data is far less so.  The two concerns are closely tied together.  That's something Bill Gates is likely very aware of, if his comments 10 years ago are any indication.</p>
<p>The forthcoming era of computing is all about data.  In as much as that data is associated with people, it's essential that data owners feel secure in the belief that they can make use of their data in computing without concern it will be misused.  </p>

<p>Bill Gates got this about the last era of computing, the first instances of e-commerce and the web.  He wrote <a href="http://www.wired.com/techbiz/media/news/2002/01/49826">a famous company-wide memo</a> ten years ago this month all about the importance of what a controversial hardware-based security paradigm called Trusted Computing.</p>

<blockquote>"If we don't do this, people simply won't be willing -- or able -- to take advantage of all the other great work we do. Trustworthy Computing is the highest priority for all the work we are doing. We must lead the industry to a whole new level of Trustworthiness in computing."</blockquote>

<p>Regarding Privacy in particular, the Gates memo put some things in ways we can relate to today, but other things seem antiquated.</p>

<blockquote>"Users should be in control of how their data is used. Policies for information use should be clear to the user. Users should be in control of when and if they receive information to make best use of their time. It should be easy for users to specify appropriate use of their information including controlling the use of email they send."</blockquote>

<p>Users should be in control of when and if they receive information to make best use of their time!  Can you imagine that?  Info overload as privacy violation.  It makes sense, yet it seems hopelessly antiquated too.</p>

<p>"In the past, we've made our software and services more compelling for users by adding new features and functionality, and by making our platform richly extensible," he wrote. </p>

<blockquote>"We've done a terrific job at that, but all those great features won't matter unless customers trust our software.

<p>"So now, when we face a choice between adding features and resolving security issues, we need to choose security. Our products should emphasize security right out of the box, and we must constantly refine and improve that security as threats evolve."</blockquote></p>

<p>Here's how the International Data Privacy Day organization puts it today.</p>

<blockquote>"In this networked world, in which we are thoroughly digitized, with our identities, locations, actions, purchases, associations, movements, and histories stored as so many bits and bytes, we have to ask - who is collecting all of this data - what are they doing with it  - with whom are they sharing it?  Most of all, individuals are asking 'How can I protect my information from being misused?'  These are reasonable questions to ask - we should all want to know the answers. 

<p>"Data Privacy Day promotes awareness about the many ways personal information is collected, stored, used, and shared, and education about privacy practices that will enable individuals to protect their personal information.  </blockquote></p>

<p>Robert Siciliano, an Online Security Evangelist at McAfee, <a href="http://blogs.mcafee.com/consumer/data-privacy-day-2012">paints a much more negative picture in a blog post yesterday</a> - probably even about the companies participating in International Data Privacy Day.  McAfee is owned by the primary sponsor of the event, though, Intel.  Siciliano speaks for many people when he says:</p>

<blockquote>"Lately, it seems that barely a day goes by when we don't learn about a major Internet presence taking steps to further erode users' privacy. The companies with access to our data are tracking us in ways that make Big Brother look like a sweet little baby sister.

<p>"Typically when we hear an outcry about privacy violations, these perceived violations involve some apparently omnipotent corporation recording the websites we visit, the applications we download, the social networks we join, the mobile phones we carry, the text messages we send and receive, the places we go, the people we're with, the things we like and dislike, and so on.</p>

<p>"How do they do this? By offering us free stuff to consume online and infrastructure for the online communities that tie us together. We gobble up their technologies, download their programs, use their services, and mindlessly click 'I Agree' to terms and conditions we haven't bothered to read."</blockquote></p>

<p>It's a cynical perspective that refers to all the glory of the Interwebs as simply free stuff to consume with mindless clicks.</p>

<p>I think I prefer the description Gates might have offered.  The global computer is now rich with features and opportunities, but those will be put at risk if people don't trust the network.  Please, Mr. Zuckerberg, don't spoil this opportunity. </p>
                    ]]></description>
                <link>http://readwrite.com/2012/01/28/data_privacy_what_bill_gates_said_10_years_ago</link>
                <guid>http://readwrite.com/2012/01/28/data_privacy_what_bill_gates_said_10_years_ago</guid>
                <category>Data Services</category>
                <pubDate>Sat, 28 Jan 2012 12:46:29 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[Why Facebook's Data Sharing Matters]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/thefacebooklogo.jpg" style="" />
			</span>
Facebook has cut a deal with political website <a href="http://politico.com">Politico</a> that allows the independent site machine-access to Facebook users' messages, both public and private, when a Republican Presidential candidate is mentioned by name.  The data is being collected and analyzed for sentiment by <a href="https://www.facebook.com/notes/us-politics-on-facebook/politico-facebook-team-up-to-measure-gop-candidate-buzz/10150461091205882">Facebook's data team</a>, then delivered to Politico to serve as the basis of <a href="http://www.politico.com/news/stories/0112/71345.html">data-driven political analysis and journalism</a>.</p>

<p>The move is being <a href="http://mediagazer.com/120113/p7#a120113p7">widely condemned in the press</a> as a violation of privacy but if Facebook would do this right, it could be a huge win for everyone.  Facebook could be the biggest, most dynamic census of human opinion and interaction in history.  Unfortunately, failure to talk prominently about privacy protections, failure to make this opt-in (or even opt out!) and the inclusion of private messages are all things that put at risk any remaining shreds of trust in Facebook that could have served as the foundation of a new era of social self-awareness.</p>
<p><a href="http://www.politico.com/news/stories/0112/71345.html"><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/FBPolitico.jpg" style="" />
			</span>
</a></p>

<p>We, ok I, have long argued here at ReadWriteWeb that aggregate analysis of Facebook data is an idea with world-changing potential.  The analogy from history that I think of is about Real estate Redlining.  Back in the middle of the last century, when US Census data and housing mortgage loan data were both made available for computer analysis and cross referencing for the first time, early data scientists were able to prove a pattern of racial discrimination by banks against people of color who wanted to buy houses in certain neighborhoods.  The data illuminated the problem and made it undeniable, thus leading to legislation to prohibit such discrimination.</p>

<p>I believe that there are probably patterns of interaction and communication of comparable historic importance that could be illuminated by effective analysis of Facebook user data.  Good news and bad news could no doubt be found there, if critical thinking eyes could take a look.</p>

<p>"Assuming you had permission, you could use a semantic tool to investigate what issues the users are discussing, what weight those issues have in relation to everything else they are saying and get some insights into the relationships between those issues," writes systemic innovation researcher <a href="https://plus.google.com/112439267620869130664/about">Haydn Shaughnessy</a> in a comment on Forbes privacy writer Kashmir Hill's <a href="http://www.forbes.com/sites/kashmirhill/2012/01/13/from-now-on-your-political-musings-on-facebook-are-being-mined">coverage</a> of the Politico deal. "As far as I can see people use sentiment analysis because it is low overhead; the quickest, cheapest way to reflect something of the viewpoints, however fallible the technique. Properly mined though you could really understand what those demographics care about." </p>

<p>Several years ago I had the privilege to sit with Mark Zuckerberg and make this argument to him, but it doesn't feel like the company has seized the world-changing opportunity in front of it.</p>

<p>Facebook does regularly analyzes its own data of course.  And sometimes it publishes what it finds.  For example, two years ago the company <a href="http://www.readwriteweb.com/archives/facebook_scientists_dissect_facebook_say_its_alive.php">cross referenced the body of its users' names</a> with US Census data that tied last names and ethnicity.  Facebook's conclusion was that the site used to be disproportionately made up of White people - but now it's as ethnically diverse as the rest of America.  Good news!  </p>

<p>But why do we only hear the good news?  That millions of people are talking about Republican Presidential candidates might be considered bad news, but the new deal remains a very limited instance of Facebook treating its user data like the platform that it could be.</p>

<p>It could be just a sign of what's to come, though.  "This is especially interesting in terms of the business relationships--who's allowed to analyze Facebook data across all users?" asks Nathan Gilliatt, principal at research firm <a href="http://socialtarget.com">Social Target</a> and co-founder of <a href="http://analyticscamp.org">AnalyticsCamp</a>.  "To my knowledge, they haven't let other companies analyze user data beyond publicly shared stuff and what people can access with their own accounts' authorization. This says to me that Facebook understands the value of that data. It will be interesting to see what else they do with it."</p>

<p>I've been told that Facebook used to let tech giant HP informally hack at their data years ago, back when the site was small and the world's tech privacy lawyers were as yet unaroused.  That kind of arrangement would have been unheard of for the past several years, though.  Two years ago, social graph hacker Pete Warden pulled down Facebook data from hundreds of millions of users, analyzing it for interesting connections before <a href="http://www.readwriteweb.com/archives/facebook_user_data_analysis.php">planning on releasing it to the academic research community</a>.  Facebook's response was assertive and <a href="http://petewarden.typepad.com/searchbrowser/2010/04/how-i-got-sued-by-facebook.html">came from the legal department</a>.  Warden decided not to give the data to researchers after all.  (Disclosure: I am writing this post from Warden's couch.)</p>

<p>"Like a lot of Facebook's studies, this collaboration with Politico is fascinating research, it's just a real shame they can't make the data publicly available, largely due to privacy concerns" bemoans Warden. "Without reproducability, it loses a lot of its scientific impact. With a traditional opinion poll, anyone with enough money can call up a similar number of people and test a survey's conclusions.  That's not the case with Facebook data."</p>

<p>"Everyone is going 'gaga' over the potential for Facebook," says Kaliya Hamlin, Executive Director of a trade and advocacy group called the <a href="http://personaldataecosystem.org/">Personal Data Ecosystem Consortium</a>.  <br />
<blockquote>"The potential exists only because they have this massive lead (monopoly) so it seems like they should be the ones to do this.</p>

<p>"Yes we should be doing deeper sentiment analysis of peoples' real opinions. But in a way that they are choosing to participate - so that the entities that aggregate such information are trusted and accountable.</p>

<p>"If I had my own personal data store/service and I chose to share say my music listening habits with a ratings service like Neilson - voluntarily join a panel. I have full trust and confidence that they are not going to turn on me and do something else with my data - it will just go in a pool.</p>

<p>"Next thing you know Facebook is going to be selling to the candidate the ability to access people who make positive or negative comments in private messages. Where does it end? How are they accountable and how do we have choice?"</blockquote></p>

<p>Not everyone is as concerned about this from a privacy perspective.  "There are many things in the online world that give me willies for Fourth-Amendment-like reasons," says Curt Monash of data analyst firm <a href="http://monash.com/">Monash Research</a>. "This isn't one of them, because the data collectors and users aren't proposing to even come close to singling out individual people for surveillance."</p>

<p>Monash's primary concern is in the quality of the data. "There's a limit as to how useful this can be," he says. "Online polls and similar popularity contests are rife with what amounts to ballot box stuffing. This will be just another example.  It is regrettable that you can now stuff an online ballot box by spamming your friends in private conversation."</p>

<p>It doesn't just have to be about messages, though.  Social connections, Likes and more all offer a lot of potential for analysis, if it's done appropriately.</p>

<p>"We need trust and accountability frameworks that work for people to allow analysis AND not allow creepiness," says Hamlin.</p>

<p>Two years ago social news site Reddit began giving its users an option to "<a href="http://www.readwriteweb.com/archives/thousands_of_reddit_users_donate_their_data_to_sci.php">donate your data to science</a>" by opting in to have activity data made available for download.  Massive programming Question and Answer site <a href="http://stackoverflow.com">StackOverflow</a> has long made available periodic dumps of its users' data for analysis.  "You never know what's going to come out of it," StackOverflow co-founder Joel Spolsky says about analysis of aggregate user data.</p>

<p>The unknown potential is indicitive not just of how valuable Facebook data is, but potentially of the relationship between data and knowledge generally in the emerging data-rich world.</p>

<p>That's the thesis of author David Weinberger's new book, <a href="http://toobigtoknow.com">Too Big to Know</a>.  "It's not simply that there are too many brickfacts [datapoints] and not enough edifice-theories," he <a href="http://www.theatlantic.com/technology/archive/2012/01/to-know-but-not-understand-david-weinberger-on-science-and-big-data/250820/">writes</a>. "Rather, the creation of data galaxies has led us to science that sometimes is too rich and complex for reduction into theories. As science has gotten too big to know, we've adopted different ideas about what it means to know at all."</p>

<p>The world's largest social network, rich with far more signal than any of us could wrap our heads around, could help illuminate emergent qualities of the human experience that are only visible on the network level.</p>

<p>Please don't mess up our chance to learn those things, Mr. Zuckerberg.</p>
                    ]]></description>
                <link>http://readwrite.com/2012/01/13/why_facebooks_data_sharing_matters</link>
                <guid>http://readwrite.com/2012/01/13/why_facebooks_data_sharing_matters</guid>
                <category>Analysis</category>
                <pubDate>Fri, 13 Jan 2012 11:21:33 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[Automatic File Conversions and More with Dropbox Automator]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/dropbox150.jpg" style="" />
			</span>
Computers keep getting closer and closer to making people obsolete. The latest step towards human obsolescence? <a href="http://dropboxautomator.com/" title="Dropbox Automator">Dropbox Automator</a>, a Web-based tool for setting up actions that happen as soon as you put a file in a Dropbox folder. It&#8217;s not flawless just yet, but it might provide a useful service for many Dropbox users. </p>

<p>The service is powered by <a href="https://app.wappwolf.com/" title="Wappwolf">Wappwolf</a>, an online &#8220;<a href="https://app.wappwolf.com/Start/contact" title="Wappwolf: About Us">action store</a>&#8221; that features a set of <strong>Web actions</strong> that can process files. For example, it has ready made actions to encrypt and decrypt files, extract text from PDFs, convert documents to PDF, generate QR codes and manipulate images. </p>
<h2 id="dropboxautomator">Dropbox Automator</h2>

<p>The Dropbox Automator works by connecting to your Dropbox account and then defining actions based on which folder you place files into. For example, I connected my Dropbox account and created a folder called <strong>Appwolf</strong>. Then I defined actions to convert files placed into that folder into PDFs. </p>

<span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/like-to-connect-1.jpg" style="" />
			</span>


<p>You can also do things like upload files to Slideshare, sign PDFs, scrape PDFs to text files and even translate files automatically using Bing Translator. It looks like much of Automator&#8217;s functionality just comes from tapping into Web-based services. </p>

<p>You can also automatically upload photos to Facebook or Flickr, add a bug (stamp) to a photo, resize or rotate photos and much more.</p>

<h2 id="afewglitches">A Few Glitches</h2>

<p>I found that the service isn&#8217;t entirely glitch free. It says that it can covert HTML files to PDF, which it does&#8230; but it just converts the text to PDF, so the tags are presented in the document instead of used for formatting. It might be that you need the header information before the service (<a href="http://www.en.conv2pdf.com/" title="conf2pdf">conf2pdf</a>) properly recognizes the file as HTML instead of plain text. </p>

<p>When Dropbox Automator zips files, it uses a format that doesn&#8217;t seem to be recognized on Mac OS X as a zip file. At least not by the <strong>Archive Utility</strong> that comes with OS X Lion.</p>

<span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/folder-appwolf.jpg" style="" />
			</span>
<em>Converting Files Using Dropbox Automator</em>

<p>It does convert OpenDocument Format (ODF) files OK, when it actually converts them. Of two ODF files I placed in the Appwolf directory, only one was converted. The other was placed in the <strong>processed</strong> folder that Dropbox Automator creates, but no PDF ever materialized. </p>

<p>But it&#8217;s a brand new service and I suspect they&#8217;re still shaking the bugs out. The service, at least for now, is free. How will they make their money? It&#8217;s unclear, but some of the actions you set up for files may cost money. So it&#8217;s possible that the developers will add premium services or charge a fee to other services for connecting users. If it catches on, I do hope that <a href="http://www.readwriteweb.com/cloud/2011/12/2011-the-year-the-free-ride-di.php" title="2011: The Year the Free Ride Died">they start providing paid accounts</a> so users can support the service. </p>
                    ]]></description>
                <link>http://readwrite.com/2011/12/30/automatic_file_conversions_and_more_with_dropbox_a</link>
                <guid>http://readwrite.com/2011/12/30/automatic_file_conversions_and_more_with_dropbox_a</guid>
                <category>Data Services</category>
                <pubDate>Fri, 30 Dec 2011 06:25:00 -0800</pubDate>
                <author>Joe Brockmeier</author>
            </item>
                    <item>
                <title><![CDATA[Can Big Data Be Outsourced? Mu Sigma's $150 Million in VC Backing]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/musigmalogo.png" style="" />
			</span>
They say Big Data is going to be big business, big innovation - a big deal.  But how is it going to go down?  Applied math and decision science company <a href="http://www.mu-sigma.com/">Mu Sigma</a> announced more than $100 million in new venture backing yesterday, including from previous investors Sequoia Capital, bringing the company's total investment to $150 million.  Mu Sigma provides big data services to some of the biggest companies in the world.  </p>

<p>How do they do it?  With a combination of math, science, creative thinking and long hours of hard work.  As democratized publishing, network connected devices and the instrumentation of everyday life combine to create a great blue ocean of big data all around us, the latest Mu Sigma funding is a valuable opportunity to get a taste of how one emerging leader in that market combines technology, math and art to engage with this big trend.  Not everyone agrees that outsourcing Big Data work like this is the solution, though.</p>
<h2>Can Big Data Be Outsourced?</h2>

<p>Mu Sigma says it exists to "enable businesses to institutionalize data-driven decision making."  Its 1300 employees in Chicago and Bangalore help clients with marketing, supply chain and risk analytics.  The firm says it "is arguably the world's largest pure-play decision sciences and analytics services company."</p>

<p>Employee reviews of the company on website <a href="http://www.glassdoor.com/Reviews/Mu-Sigma-Reviews-E253258.htm">GlassDoor</a> paint a picture of hard-driving young employees working grueling hours for low pay, but learning a lot at a young company.</p>

<p>The seven year old firm helps clients with things like customer segmentation and purchase likelihood analysis in marketing, fraud detection and severity and statistical analysis for FDA trials in risk analysis and supply chain work like trend plotting, due date quoting, expedition optimization, location allocation "decisioning", etc.  All based on data.</p>

<p>How can Mu Sigma compete in each of those tasks with other firms that specialize in one or the other?  That's unclear, but the company has developed momentum based on its broad approach.  Mu Sigma says that it's profitable, though the company declined to provide any specific financial numbers.  </p>

<p>Not everyone believes that solutions like Mu Sigma are the answer to Big Data problems and opportunities.  "I'm skeptical of the idea of end to end 'analytics outsourcing' right now," says Peter Skomoroch, of <a href="DataWrangling.com">DataWrangling.com</a>.<br />
 <blockquote>"There is value in having external experts embedded with internal teams to help with big data, but to compete companies will also need to build up in-house talent.   </p>

<p>"It is tough to find good data people, and even more difficult to find ones with business sense and domain knowledge.  Insight and creativity are not likely to be commoditized any time soon.  The competitive advantage in this space will go to companies that build up unique datasets and build teams that know how to leverage them.  Most game changing analytics is going to come from a small set of talented individuals, not an army of contractors."</blockquote></p>

<p>In-house data scientists are incredibly hard to find, though.  Cathy O'Neil, data scientist at ad startup Intent Media, <a href="http://mathbabe.org/2011/12/26/a-good-data-scientist-is-hard-to-find/">says</a> this is in part because "It is far less sexy to try to honestly find the confidence interval of a prediction than it is to model behavior."<br />
<blockquote>"Data scientists are considered magical when they forecast behavior that was hitherto unknown, and they are considered total downers when they tell their CEO, 'hey there's just not enough data to start that business you want to start,' or 'hey this data is actually really fat-tailed and our confidence intervals suck.'</p>

<p>"In other words, it's something like what the head of risk management had to face at a big bank taking risks in 2007. There's a responsibility to warn people that too much confidence in the models is bad, but then there's the political reality of the situation, where you just want to be liked and you don't actually have the power to stop the relevant decisions anyway."</blockquote></p>

<p>Perhaps given that reality, outside big data firm Mu Sigma is clearly a company with some economic wind in its sails.  Deborah Gage at the Wall St. Journal's Venture Wire provides a good look at <a href="http://blogs.wsj.com/venturecapital/2011/12/28/mu-sigma-lands-big-money-for-big-data/?mod=google_news_blog">the company's fast growth and interesting training program</a> in her coverage this morning.  </p>

<h2>Mu Sigma and Innovation</h2>

<p>Reading previous coverage of the company's work elsewhere, one name keep coming up: Zubin Dowlaty, Vice President and head of innovation and development at Mu Sigma.</p>

<p>Dowalty spent the 1990's doing statistical modeling at UPS.  Then he joined the publicly traded InterContinental Hotels Group, where he was first the Director of Analytics in Consumer Insight and then the VP of Decision Sciences.  He was featured prominently in a 2008 New York Times story about <a href="http://www.nytimes.com/2008/04/09/technology/techspecial/09predict.html">corporations using Prediction Markets</a> to surface cost-saving and other ideas from inside their companies.  Dowalty was photographed for the story wearing a wizard's cap and holding a magical looking walking staff in his hands.  </p>

<p>He built an elaborate system to invite the hotel company's employees to submit and vote on ideas, win rewards if theirs were selected and to surface via crowdsourcing strategic initiatives the company could act on. "We wanted to tap the creative class that may not be able to voice their ideas,"  Dowlaty told the Times.</p>

<p>Once at Mu Sigma, Dowlaty has become one of the company's most visible public figures.  His statements, as the head of innovation and development at a firm so focused on innovation, are noteworthy.</p>

<p>In a January 2011 article from <a href="http://tdwi.org/Articles/2011/01/05/Rise-of-Data-Science.aspx?Page=3">The Data Warehousing Institute</a> on the rise of the data scientist, Dowlaty articulates the role of art and of science in big data.<br />
<blockquote>"I'm not a big fan of the spaghetti method.  It makes me nervous when people run a lot of analytic techniques just to get the answer they want, instead of being objective. Doing this job properly requires the rigor of a scientist. The scientist can see things that other people cannot see."</blockquote></p>

<p>As a standalone statement, that doesn't sound particularly creative.  It is important, though.  "The 'spaghetti method,'" cautions Josh Wills, Chief Data Scientist at <a href="http://cloudera.com">Cloudera</a>, "frantically searching for a technique that gives you the answer you want (or potentially, the answer that someone higher up in the org wants), as opposed to using the scientific method. This is a big problem in the industry, and the theory is that using an external firm mitigates that habit to some extent. Being a good data scientist often means telling powerful people stuff that they don't want to hear."</p>

<p>Other statements from Dowlaty help put that sentiment about rigor in creative context.  Mu Sigma itself uses a variety of different analytic models to tackle all the problems they engage with.</p>

<p>Dowlaty <a href="http://www.revolutionanalytics.com/why-revolution-r/case-studies/Portfolio-Strategy-Helps-Mu-Sigma-Maintain-Leadership-Role-in-Competitive-Market.php">told Revolution Analytics</a>, whose R statistics software Mu Sigma makes use of:<br />
<blockquote>"We like to diversify our models...We have a portfolio of about 10 models that we'll run to assess the stability of the coefficient and the predictive capability of that particular model. By running all the models, you can see which ones are the best predictors."</blockquote></p>

<p>Revolution says of Dowlaty's use of R at Mu Sigma, "The benefit of an 'ensemble' approach is that when new analytic techniques emerge, they can be brought into the mix without causing disruption. This makes the R especially valuable to Dowlaty, since the R software library evolves continually as members of the worldwide R community contribute new packages and programs."</p>

<p>In fact, both rigor and flexibility are key to the paradigm Dowlaty advocates. "The trend is toward a multi-disciplinary approach to extracting value from data," he told The Data Warehousing Institute early this year. "It's not just about math anymore. You also need technology skills, but what ultimately separates the analyst from the scientist is the dimension of artistic creativity. It's the soft skills that make the big difference."</p>

<p>That combination of skills is what enables the firm to tackle the incredibly complex work they do.  Dhiraj Rajaram, Mu Sigma's CEO and the man who founded the company in 2004, spoke at the 2010 <a href="http://www.predictiveanalyticsworld.com/">Predictive Analytics World</a> conference on a panel with Mu Sigma customer Walmart.  </p>

<p>Walmart Financial Services, which named Mu Sigma its Supplier of the Year in 2011, works with the big data company to analyze and optimize the marketing of its financial products.</p>

<p>Decision Management analyst James Taylor <a href="http://jtonedm.com/2010/02/16/marketing-mix-modeling-at-walmart-financial-services-pawcon/">blogged the following summary</a> of the conference presentation about the collaboration between Walmart Financial Services and Mu Sigma.  This sounds like very complicated work.<br />
<blockquote>"WFS uses transaction life analysis around run rate and growth, price / mix analysis, financial returns and qualitative analysis of the creative. Marketing Mix modeling, optimization, lets them see the effect of individual campaigns (there's a lot of Walmart stuff going on in the market), account for seasonality and manage at the store level. The idea is to make sure the marketing investment is optimized, targeted and repeatable.</p>

<p>"Marketing mix optimization uses weekly sales, store traits and demographics, event information and macro-economic data to see how effective specific events were and what was the contribution to the overall effect. What was the value or contribution of each element, did they cannibalize each other, did they resonate in specific areas etc."</blockquote></p>

<p>That sounds like a potent combination of math, science and creative thinking.  It's probably more a picture of the sector than of one company alone.  Forrester analyst James Kobielus specializes in big data and says he's done one briefing with Mu Sigma but didn't detect any particular unique flavor to the firm's work relative to others in the sector.   Mu Sigma hasn't yet responded to our request for comment on this article.</p>

<p>Perhaps this company is typical of the sector and the questions to ask about it are more general.</p>

<p>"My caveat with services like MuSigma is that they can analyze your data, but they can't change your business," says Cloudera's Wills.  <br />
<blockquote>"You are free to ignore what they tell you, and it is often the case that the answers they can give you are limited by your business practices and the data that you currently collect. The advantage of having an in-house data scientist, especially one with some programming skill, is that they can develop systems that collect better data so that they can come up with better answers.</blockquote></p>
                    ]]></description>
                <link>http://readwrite.com/2011/12/29/can_big_data_be_outsourced_mu_sigmas_150_million_i</link>
                <guid>http://readwrite.com/2011/12/29/can_big_data_be_outsourced_mu_sigmas_150_million_i</guid>
                <category>Analysis</category>
                <pubDate>Thu, 29 Dec 2011 00:53:59 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[After Years of Missteps, Facebook's Timeline is an Epic Win]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/facebook_logo_square_apr10.jpg" style="" />
			</span>
Facebook's <a href="http://www.readwriteweb.com/archives/facebook_timeline.php">new Timeline profile feature</a> is great, even if it is a little strange.  It's narcissistic, but that's a big part of the fun of it, and I'm not sure that other peoples' timelines are nearly as interesting as mine is to me. </p>

<p>It's an incredibly feature-rich new type of social network profile. It's a re-imagination of what a profile can be.  It makes me want to use Facebook more, to share more data with Facebook so that it can be preserved and displayed so nicely, years into the future.  While other Facebook features have pushed users into posting publicly by default, or posted their activities from other places they didn't understand would become part of the public record, I think Timeline is a genuine value add to incentivize users to share more.  I think it's great.</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/FBTimeline.png" style="" />
			</span>
<br />
Data is at the heart of the Facebook Timeline, your data - about your life, about your activities as recorded on Facebook and about  your social connections.  The music you listen to, the places you go and the things you do.  Insights and experiences built on top of data are going to be a big part of the future of human/computer interactions.  Facebook Timelines are a great first look at that idea for hundreds of millions of people. They are also something that Twitter can never do, for both technical and cultural reasons.</p>

<p>It's one thing to see this data all in a News Feed as Facebook has long showed it, it's fundamentally different to see Yourself and Others presented like a work of art in this new Timeline layout.  </p>

<p>By highlighting the content you've published that has received the most social engagement, in the form of comments and Likes, your Facebook Timeline takes its best shot at presenting your Best Self to the world.  The mundane updates are hidden in the background and the highlights of your life, if you posted about them on Facebook, are programmatically discoverable and now displayed in an attractive page layout.</p>

<p>It doesn't work perfectly, my Timeline says that I married my wife 3 times on 3 different dates, but generally speaking it works really well.  It looks great on m.facebook.com too.</p>

<p>The Facebook Timeline represents the Instrumentation of Your Life, making things measurable and then building on top of those measurements.  It's a big deal in the world of social software.</p>

<p>That Facebook launched such a bold new implementation of every user's data about themselves just months after getting slapped with a 20 year privacy audit requirement from the US government is bold.</p>

<h2>As Not Seen on Twitter</h2>

<p>Meanwhile, over on Twitter, that competing social network can't remember what you did two weeks ago.  It does remember, it just won't let you remember.  Historical content on Twitter is severely limited.  </p>

<p>The company has said officially that's because Twitter is all about the here and now, it's real-time.  Unofficially it's said though that the root of the problem was in a series of database creation decisions that were made years ago.  It would now be super expensive to change that. </p>

<p>There is something about Twitter that's more conversational, more News focused and less conducive culturally to something like Timeline.  </p>

<p>For the vast majority of its users, I'd also guess that Twitter accounts post fewer messages and get fewer responses that can be measured to determine highlights than is the case on Facebook.  </p>

<p>Facebook also has a lot of structured data in the user's profile and changes to that become events, which social activity swarms around and which then become notable points in your life.  You changed your marital status?  That's probably going to get a lot of discussion.  There is no equivalent on Twitter.  Were Twitter to highlight your biggest tweets, they would likely be the wittiest quips you've made over the years, not the real life events.</p>

<p>Twitter is working on convincing people that tweets are great for reading, that it's largely a reading experience.  Facebook, on the other hand, has always wanted you to share, share, share.  </p>

<p>Many of us are doing things outside of Facebook, though.  A lot of that is being shared back into our Newsfeed, but not all of it.  I am very impressed with what Facebook has done, but I wish there was some more effective competition out there.  There are various startups who have tried to do this, though none anywhere near as well as Facebook's hired and acquired team of world-beating design pros. </p>

<p>I joined Facebook 5 years ago this Fall, according to my Timeline.  It's cool to see all that history presented so nicely and it makes me want to put more content into Facebook so I can see it later.  I imagine that's the point. </p>
                    ]]></description>
                <link>http://readwrite.com/2011/12/16/after_years_of_missteps_facebooks_timeline_is_an_e</link>
                <guid>http://readwrite.com/2011/12/16/after_years_of_missteps_facebooks_timeline_is_an_e</guid>
                <category>Data Services</category>
                <pubDate>Fri, 16 Dec 2011 01:05:15 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[Engag.io: A Tool to Track All Your Conversations Online in One Place]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/engagiologo.png" style="" />
			</span>
Social media is supposed to be all about engagement and authenticity, but sometimes it can feel so distributed and overwhelming that conversations get lost.  A new web app called <a href="http://Engag.io">Engag.io</a> has tackled this classic problem and offers a pretty good solution that I think you'll want to check out.  It's in private alpha right now but we've got an invite code at the bottom of this post.  That someone is making an app like this gives me hope that there are still great ideas that can be built on top of the most basic building blocks of the social web.</p>

<p>Engag.io, which gets its name from being the place for your online engagement input and output, is like an inbox for all your conversations on Twitter, Facebook, Google Plus, Foursquare and blog comments.  It's an inbox with analytics.  It's built by the team behind content curation company <a href="http://www.eqentia.com/">Eqentia</a>.  Eqentia is ambitious but a little too complicated; Engag.io is very simple and the value of it will be immediately obvious to many people.</p>
<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/engagioscreen.png" style="" />
			</span>
</p>

<p>In order to get started with Engagio, you have to authenticate with different services you use around the web.  Fortunately, this has become super easy to do and very secure with just a few clicks.  The open authentication standards that have been developed over recent years make mashups like Engagio really easy to implement and that's awesome.</p>

<p>You can log in with your accounts on Twitter, Facebook, Foursquare, Google Plus, Disqus, Hacker News and Tumblr.   Then Engagio will watch for comments posted to and from you on any of those services and give you one unified inbox to track the conversations inside of.  </p>

<p>"We believe that having a universal Conversation Inbox could become a daily time saver," says the Engagio blog.  "It will save you time because you don't have to check the multiple source sites where you have placed your comments. And you can for example focus on replies first, before you get to other commenting."</p>

<div class="pullquote">That's pure gold, right there - but a few days later I'd already forgotten who said it to me, where to find it, etc.  Enter Engagio Comment Search and boom!  All my problems are solved.</div>The ability to search your comments is really nice too.  It's already coming in handy for me.  The other day on Twitter I was talking about the concept of the Project Triangle: Fast, good, cheap - pick two.  I was saying that I've been thinking about how different companies in my life relate to that equation and author-from-the-future <a href="https://twitter.com/#!/toddsattersten">Todd Sattersten</a> says to me, "@marshallk dropping one to get the other two is a faulty construct. Vary 4th element Scope to allow all 3 #agile...My review of @kmaney Trade-off http://t.co/RrejjqJQ and check out my ebook Fixed to Flexible for more use http://t.co/6nygC7OX."

<p>That's pure gold, right there - but a few days later I'd already forgotten who said it to me, where to find it, etc.  Enter Engagio Comment Search and boom!  All my problems are solved.</p>

<p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/engagioscreen2.png" style="" />
			</span>
</p>

<p>It's a great idea and I've been returning to the site daily to try and stay engaged with people who took the time to respond to me around the web.  It's mostly Twitter conversations and some Google Plus threads in my experience, but I hope that Engagio will help me be all the more...in touch with conversations in other places too.</p>

<p>The analytics part of the service could really use some UI work, but the idea is that Engagio will show you who you're interacting with the most.  You might be surprised who some of your top responders are - and those are people you should probably engage with all the more.  Or at least know, if you're going to be as social as you might want to be in the social media.</p>

<p>The Engagio team could use someone to sit down with them and go through some real-life commenting experiences because I think the user flow could really be improved.  Site founder William Mougayar is a commenting machine, he posts comments all the time everywhere, but I suspect his experiences are different from the way other people would want to use a service like this.</p>

<p>Super blogger and tech investor Fred Wilson, a man who gets more and more intelligent comments in response to his online activity than probably anyone else you'll ever meet, has been<a href="http://www.avc.com/a_vc/2011/12/engagio.html"> a cheerleader for Engag.io</a>.  Wilson says he urged Mougayar to "make it like gmail for social conversations."   Gmail is deceptively simple though and Engagio will take more work to get close to that level of usefulness.  As <a href="http://www.fakegrimlock.com/">FAKE GRIMLOCK</a> put it, "IS MVP. UGLY OK FOR NOW."  A minimum viable product it is, but one that I think many people will want to see developed further.</p>

<p>That this is a tool designed to make the living social graph more transparent and sticky is exciting.  I absolutely love the idea.  Several users have pointed out that a mobile interface would suit real user behavior especially well and I agree with that.  </p>

<p>A small number of people can jump in and kick the tires now, using the code "rwwengage" at <a href="http://Engag.io">Engag.io</a>.</p>
                    ]]></description>
                <link>http://readwrite.com/2011/12/12/engagio_a_tool_to_track_all_your_conversations_onl</link>
                <guid>http://readwrite.com/2011/12/12/engagio_a_tool_to_track_all_your_conversations_onl</guid>
                <category>Data Services</category>
                <pubDate>Mon, 12 Dec 2011 09:09:30 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
                    <item>
                <title><![CDATA[Easy-to-Use Mashup Tool ifttt Gets Betaworks Backing]]></title>
                <description><![CDATA[
                                        <p><span class="embedded-Media-image img-caption-c">
				<img src="http://readwrite.com/files/files/files/images/lead-images/ifttt150.png" style="" />
			</span>
Point and click web mashup startup <a href="http://IFTTT.com">ifttt</a> ("if this then that") has raised financing from cutting-edge tech incubator <a href="http://Betaworks.com">Betaworks</a>.  News of the funding came to us via <a href="http://neuvc.com/labs/vcdelta/">NeuVC's bot</a> watching the firm's portfolio page, which is fitting given the nature of the startup.</p>

<p>ifttt allows anyone to set up a chain of conditional actions between a wide variety of web services, like "If I post a photo to Flickr, save it to my Dropbox."  The company calls these "recipes."  We wrote about the service when <a href="http://www.readwriteweb.com/archives/how_to_back_up_your_life_automatically_with_ifttt.php">it launched to the public in September</a>.  Microsoft's Scott Hanselman also <a href="http://www.hanselman.com/blog/EssentialIFTTTIfThisThenThatProgrammingWorkflowsForHumansUsingTheWebsSocialGlue.aspx">wrote up a nice review of the service</a> and says "this is going to be huge."  ifttt isn't just a single service, though, and it isn't even just an amalgamation of multiple services strung-together; it's a great example of a whole paradigm of DIY mashups.  As Blogger and WordPress were to self-publishing and YouTube was to video publishing, so ifttt could be to working with interlinked web applications for everyday people.  Can this startup herald a new era of lay hackers?  The UI is good, the only question is whether there's really enough demand for such a service.<br />
</p>
<p>ifttt was started by Linden Tibbets, a computer scientist formerly at design powerhouse <a href="http://www.ideo.com/">IDEO</a>, film artist Alexander Tibbets and designer Jesse Tane, also formerly of IDEO.  </p>

<p>Here's how the startup introduced itself at launch:<br />
<blockquote>"We began with the theory that as our digital tools became more domain specific and easier to use, there would be vast amounts of creative potential in how any two tools might be used in tandem. We knew that with this immense potential came a problem of equal proportions. There just aren't enough developers and designers in the world to craft all these connections. A million developers at a million laptops wouldn't even make a dent. So we set out to build an incredibly simple tool that anyone could use to define creative, event-driven tasks that fit the pattern 'if this then that.'"</blockquote></p>

<p>If there's anyone who can pull something like this off, having experience at IDEO is great background from which to give it a shot.</p>

<p>The most popular ifttt recipes are <a href="http://ifttt.com/recipes?sort=popular">here</a>; co-founder Linden Tibbets's are <a href="http://ifttt.com/people/linden">here</a>.</p>

<p>Is this something that a whole lot of people are going to be interested in and go to the trouble to do?  I know I am and I wouldn't be surprised to see that you are, RWW readers, but it will be interesting to see how this becomes a business.  Either way, it's great to see one of the web's most interesting investors back something so focused on generating creative use of online tools.</p>
                    ]]></description>
                <link>http://readwrite.com/2011/12/09/easy-to-use_mashup_tool_ifttt_gets_betaworks_backi</link>
                <guid>http://readwrite.com/2011/12/09/easy-to-use_mashup_tool_ifttt_gets_betaworks_backi</guid>
                <category>Data Services</category>
                <pubDate>Fri, 09 Dec 2011 03:03:13 -0800</pubDate>
                <author>Marshall Kirkpatrick</author>
            </item>
            </channel>
</rss>

