Facebook's data team proved once again today that when you analyze a large set of anonymous user data from the world's biggest social network, you can learn some very interesting things about the state of humanity.
In a blog post titled What's on your mind?, the company disclosed the results of its text analysis of 1 million anonymized messages. Among the findings: Young people swear more than older people and older people talk about other people more than just themselves. Popular people are more likely to talk about other people, TV and movies, to swear and use religious words. Less popular people are more likely to talk about work, sleeping, eating and thinking. These are but a few of the many observations made by the in-house data team. The biggest question about the data remains unanswered, though: what could a world of independent researchers discover in this data?
Above: Facebook found that the words on top of the left chart appeared more in profiles from older people, on the right, from more popular people. The company's blog post contains 5 more graphs concerning other word correlations.
For Facebook to make bulk, anonymized data available to independent researchers has long been a hope of mine and I've argued about how important an opportunity this is all the way up to Mark Zuckerberg himself.
My favorite example of how data like this can be important is from history. When U.S. census data and bank home loan data were both made available for computer analysis and cross referencing for the first time, independent researchers unearthed a pattern of discrimination against African American families seeking to buy homes in big sections of major U.S. cities. This practice was called Real Estate Redlining and it was exposed thanks to aggregate data analysis. I am of the belief that social injustices of comparable significance, as well as opportunities for significant economic development, could be discovered in the patterns hidden across millions of Facebook status updates, friend connections, Likes and more.
Oliver Chiang, at Forbes, agreed with my argument in an article this month: "But really, what Facebook should do... is open up its data for research. Because they don't, we get highly sanitized findings (like these top trends, or the finding that being active on Facebook leads to increased happiness), and even, reportedly, a black market for Facebook data. The company collects the thoughts, images and content of more than half a billion users - that data could be used for good."
Slate.com's Michael Agger wrote last month in an article discussing the opportunities latent in Facebook's data, "It would be helpful for transportation planners to know the places where people complain the most about traffic. Educators could see the data and sentiment analysis around how a community feels about its local schools."
Bernardo Huberman, a social technology researcher at HP Labs who was able to gain access to bulk Facebook data years ago, before the site was as large, controversial and armed with lawyers as it is today, is both understanding and hopeful.
"This data is amazingly important from a commercial point of view," Huberman told me in a telephone interview last week.
"But [Zuckerberg], he's not a researcher, he's just a businessman. I have a feeling that Twitter's situation is roughly the same; all this research stuff and so on is gravy. [In recent years] I've had very little traction in terms of getting access to their data. They are busy with other things, with keeping their business viable.
"They have a different view of it. Perhaps in a few years, Zuckerburg will relax and say 'I want to be the kind of public figure that wants to release data'....but right now I don't think that will motivate these people."
I hope that's not correct. I hope that every time the Facebook Data Team performs another batch of analysis on anonymized, bulk Facebook data and gives us an opportunity to look into our own souls - the potential that lies untapped in that data will be taken all the more seriously. That potential will never be realized if analysis of it is limited to the eyes, minds, interests, skills and perspectives of the company's own researchers.