Facebook has cut a deal with political website Politico that allows the independent site machine-access to Facebook users’ messages, both public and private, when a Republican Presidential candidate is mentioned by name. The data is being collected and analyzed for sentiment by Facebook’s data team, then delivered to Politico to serve as the basis of data-driven political analysis and journalism.
The move is being widely condemned in the press as a violation of privacy but if Facebook would do this right, it could be a huge win for everyone. Facebook could be the biggest, most dynamic census of human opinion and interaction in history. Unfortunately, failure to talk prominently about privacy protections, failure to make this opt-in (or even opt out!) and the inclusion of private messages are all things that put at risk any remaining shreds of trust in Facebook that could have served as the foundation of a new era of social self-awareness.
We, ok I, have long argued here at ReadWriteWeb that aggregate analysis of Facebook data is an idea with world-changing potential. The analogy from history that I think of is about Real estate Redlining. Back in the middle of the last century, when US Census data and housing mortgage loan data were both made available for computer analysis and cross referencing for the first time, early data scientists were able to prove a pattern of racial discrimination by banks against people of color who wanted to buy houses in certain neighborhoods. The data illuminated the problem and made it undeniable, thus leading to legislation to prohibit such discrimination.
I believe that there are probably patterns of interaction and communication of comparable historic importance that could be illuminated by effective analysis of Facebook user data. Good news and bad news could no doubt be found there, if critical thinking eyes could take a look.
“Assuming you had permission, you could use a semantic tool to investigate what issues the users are discussing, what weight those issues have in relation to everything else they are saying and get some insights into the relationships between those issues,” writes systemic innovation researcher Haydn Shaughnessy in a comment on Forbes privacy writer Kashmir Hill’s coverage of the Politico deal. “As far as I can see people use sentiment analysis because it is low overhead; the quickest, cheapest way to reflect something of the viewpoints, however fallible the technique. Properly mined though you could really understand what those demographics care about.”
Several years ago I had the privilege to sit with Mark Zuckerberg and make this argument to him, but it doesn’t feel like the company has seized the world-changing opportunity in front of it.
Facebook does regularly analyzes its own data of course. And sometimes it publishes what it finds. For example, two years ago the company cross referenced the body of its users’ names with US Census data that tied last names and ethnicity. Facebook’s conclusion was that the site used to be disproportionately made up of White people – but now it’s as ethnically diverse as the rest of America. Good news!
But why do we only hear the good news? That millions of people are talking about Republican Presidential candidates might be considered bad news, but the new deal remains a very limited instance of Facebook treating its user data like the platform that it could be.
It could be just a sign of what’s to come, though. “This is especially interesting in terms of the business relationships–who’s allowed to analyze Facebook data across all users?” asks Nathan Gilliatt, principal at research firm Social Target and co-founder of AnalyticsCamp. “To my knowledge, they haven’t let other companies analyze user data beyond publicly shared stuff and what people can access with their own accounts’ authorization. This says to me that Facebook understands the value of that data. It will be interesting to see what else they do with it.”
I’ve been told that Facebook used to let tech giant HP informally hack at their data years ago, back when the site was small and the world’s tech privacy lawyers were as yet unaroused. That kind of arrangement would have been unheard of for the past several years, though. Two years ago, social graph hacker Pete Warden pulled down Facebook data from hundreds of millions of users, analyzing it for interesting connections before planning on releasing it to the academic research community. Facebook’s response was assertive and came from the legal department. Warden decided not to give the data to researchers after all. (Disclosure: I am writing this post from Warden’s couch.)
“Like a lot of Facebook’s studies, this collaboration with Politico is fascinating research, it’s just a real shame they can’t make the data publicly available, largely due to privacy concerns” bemoans Warden. “Without reproducability, it loses a lot of its scientific impact. With a traditional opinion poll, anyone with enough money can call up a similar number of people and test a survey’s conclusions. That’s not the case with Facebook data.”
“Everyone is going ‘gaga’ over the potential for Facebook,” says Kaliya Hamlin, Executive Director of a trade and advocacy group called the Personal Data Ecosystem Consortium.
“The potential exists only because they have this massive lead (monopoly) so it seems like they should be the ones to do this.
“Yes we should be doing deeper sentiment analysis of peoples’ real opinions. But in a way that they are choosing to participate – so that the entities that aggregate such information are trusted and accountable.
“If I had my own personal data store/service and I chose to share say my music listening habits with a ratings service like Neilson – voluntarily join a panel. I have full trust and confidence that they are not going to turn on me and do something else with my data – it will just go in a pool.
“Next thing you know Facebook is going to be selling to the candidate the ability to access people who make positive or negative comments in private messages. Where does it end? How are they accountable and how do we have choice?”
Not everyone is as concerned about this from a privacy perspective. “There are many things in the online world that give me willies for Fourth-Amendment-like reasons,” says Curt Monash of data analyst firm Monash Research. “This isn’t one of them, because the data collectors and users aren’t proposing to even come close to singling out individual people for surveillance.”
Monash’s primary concern is in the quality of the data. “There’s a limit as to how useful this can be,” he says. “Online polls and similar popularity contests are rife with what amounts to ballot box stuffing. This will be just another example. It is regrettable that you can now stuff an online ballot box by spamming your friends in private conversation.”
It doesn’t just have to be about messages, though. Social connections, Likes and more all offer a lot of potential for analysis, if it’s done appropriately.
“We need trust and accountability frameworks that work for people to allow analysis AND not allow creepiness,” says Hamlin.
Two years ago social news site Reddit began giving its users an option to “donate your data to science” by opting in to have activity data made available for download. Massive programming Question and Answer site StackOverflow has long made available periodic dumps of its users’ data for analysis. “You never know what’s going to come out of it,” StackOverflow co-founder Joel Spolsky says about analysis of aggregate user data.
The unknown potential is indicitive not just of how valuable Facebook data is, but potentially of the relationship between data and knowledge generally in the emerging data-rich world.
That’s the thesis of author David Weinberger’s new book, Too Big to Know. “It’s not simply that there are too many brickfacts [datapoints] and not enough edifice-theories,” he writes. “Rather, the creation of data galaxies has led us to science that sometimes is too rich and complex for reduction into theories. As science has gotten too big to know, we’ve adopted different ideas about what it means to know at all.”
The world’s largest social network, rich with far more signal than any of us could wrap our heads around, could help illuminate emergent qualities of the human experience that are only visible on the network level.
Please don’t mess up our chance to learn those things, Mr. Zuckerberg.