Facebook plans to announce the availability of a firehose of user data at its F8 developers conference in April, we believe based on research. Such an offering could be similar to the firehose that Twitter has shared with large partners and select small developers building the famous Twitter ecosystem of 3rd party applications around the web. A Facebook representative did not offer a denial, saying only that the company would not comment on speculation.
The huge social network was once private by default, then made controversial changes in December that pushed hundreds of millions of users toward publishing their information in public and now appears aimed to complete the about-face at its F8 developer conference by offering up public user data in a huge river that outside parties can consume, analyze and build on top of.
"Nobody thinks about how much valuable information they're generating just by friending people and fanning pages. It's like we're constantly voting in a hundred different ways every day. And I'm a starry-eyed believer that we'll be able to change the world for the better using that neglected information. It's like an x-ray for the whole country - we can see all sorts of hidden details of who we're friends with, where we live, what we like." - Pete Warden, The Man Who Looked Into Facebook's Soul
It's not clear exactly what would be included in this firehose, it could be a stream of low-value Fan Page promotional content, for example. The most likely thing content to be included though is user activity data published under public privacy settings. There's far, far more of that today than there was just a few months ago.
If you've participated in a supermarket loyalty program, you're familiar with the concept of opting-in to sharing data about your activities with outside parties in exchange for benefits. In that common practice, though, consumers gain shopping discounts but get nothing from the analysis of the data they emit.
In the case of the Twitter Firehose, the much sought-after full feed of public user data from across the site, users gain access to all kinds of interesting applications and insights based on analysis of their use of Twitter.
A Facebook firehose would be much bigger. We're hearing that there will be no launch partners in the announcement, but the imagination runs wild thinking about all the mashup possibilities. We learned last week that user location data is coming to Facebook at F8, now picture all this rich data roaring like a river into the data digesting machines of a wide range of developers all over the world.
A firehose of public Facebook user activity data could function like a living, breathing global census. Cross reference that data with any other data set and we may find an ocean of insights into the human condition, around the world, for slices of people, second by second or over time.
This is something we've been calling on Facebook to do for some time. I've sat with founder Mark Zuckerberg and discussed the importance and potential of releasing aggregate user data at length.
Is the inclusion of public activity into a firehose programatically available to outside developers a case of broadcast that violates user control and thus privacy?
I don't think it's clear either way. In a discussion about aggregate Twitter data analysis late last year, a representative of the Electronic Frontier Foundation told me that Twitter users had no reasonable expectation that their data wouldn't be redistributed and analyzed in bulk because Twitter was a public forum.
Facebook used to be different. It was private by default, our actions were shared only with friends and family that we gave permission to see our status messages and photos.
Then in December the company made a dramatic shift, prompting users to re-evaluate their privacy settings and making "share with everyone all over the internet" the new default for most options. Mark Zuckerberg said Facebook was only changing to reflect the way the world was changing, but we argued that was a disingenous rationalization of Facebook's culture-changing actions driven in part by its own profit motive. We also argued that by pushing users toward being more public the company was reducing user control over data and spreading distrust about making data available online at all. That put at risk the idea of sharing your data in a way that could be analyzed.
Is there a reasonable expectation that online social networking activity set to "public" will not be redistributed in bulk to outside parties? How can a company like Facebook respect user privacy as much as possible while still achieving the incredible things that can be achieved by making aggregate user data available for analysis?
Let's begin to discuss it.
See also: The personal blog of Cameron Marlow, Facebook's in-house sociologist and big data guy.
Related analysis: Twitter 2.0: API Rate Change Could Lead to a World of New Apps & Features
Chewing on the Issues: Twitter Data Dump: InfoChimps Puts 1B Connections Up for Sale