Twitter announced yesterday that for the first time, outside developers will be allowed to purchase access to 50% of all the messages that flow through its network. The price for half the firehose? $360,000 per year, payable to partner company Gnip.
The full firehose delivers 1,000 Tweets each second, Twitter’s Ryan Sarver said yesterday, too much data for most companies to handle without “dropping a lot of it on the floor.” The Twitter announcement was made at the data-centric Defrag conference outside of Denver, where IBM Chief Scientist Jeff Jonas today discussed separately a firehose of data far, far larger: geo-tagged transaction data created by mobile devices. While Twitter data gets a whole lot of hype, the most disruptive data platform for development may instead come from the wireless mobile network operators.
Jonas recounted a number today that he blogged about last Summer:
Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day. Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not. Got a Blackberry? Every few minutes, it sends a heartbeat, creating a transaction whether you are using the phone or not.
For those keeping track at home, we can now do some comparison. 1000 Tweets per second equals 86 million 400 thousand Tweets per day. That’s a big number, but 600 billion geo-spatially tagged transactions per day is a whopping 7000X bigger.
That mobile data enables prediction of our actions, analysis of our real-world social circles and really interesting business analysis, Jonas said today. Want to know how many people of a particular demographic group are willing to travel 20 miles to shop at a particular store, and how that has changed over time? Jonas says analysts can use that mobile use data to predict a department store’s commercial performance before quarterly earnings are reported.
De-anonymizing that data? Trivial, Jonas says.
Data as Foundation for the Future
It’s not about spying on you. Jonas says the coming era of big data will be one in which “data finds data and the relevance will find you”; where questions are answered before we think to ask them, where recommendations take precedence over search.
That’s a vision that Google shares, too. “If I look at enough of your messaging and your location, and use Artificial Intelligence,” Google CEO Eric Schmidt said this Summer, “we can predict where you are going to go.” Schmidt told the Wall St. Journal in August:
“We’re still happy to be in search, believe me, but one idea is that more and more searches are done on your behalf without you needing to type….I actually think most people don’t want Google to answer their questions. They want Google to tell them what they should be doing next.”
In other words, mobile network-level user data as a platform for development of software and services is going to dwarf the hottest real-time consumer data feed on the market today (Twitter’s firehose) in both size and sophistication.
If those mobile data points were to be sold at the same price as Twitter is now selling each Tweet, how much money would we be talking about? 14,000 times $360,000 would be $5 billion. That seems like a very reasonable sum of money to imagine all that data being worth. 600 billion mobile social data points from across the United States could be used to generate far more wealth than that, too.
All of this analysis neglects to adress the rise of network-connected devices, the Internet of Things. Earlier this year there were for the first time more connected devices coming online with AT&T and Verizon than new human subscribers. That trend will skew the numbers in favor of data sources outside of social network data even more.
Of course there’s no one single vendor who owns all that data (Who owns it at all, in fact? Do mobile phone users have some ownership over it?) but there are a number of companies taking different approaches to monetizing the parts of that data they do have access to. (Including, incidentally, ReadWriteWeb sponsor Alcatel-Lucent.)
Jonas said on stage today that he has talked to one company that sees 85% of those 600 billion data points flow through their hands, though he declined to name the company.
Streams of data available to serve as a foundation for innovative development may be a very significant part of the future – but the biggest sources of that data might not be what we expect.