Facebook used to be dominated by white and Asian users, but tonight the company announced results of a demographic study of its users concluding that the percentages of black and Hispanic users of the site are now approaching their percentages of the population in the general U.S. public. Hear that? Facebook scientists have looked at the data and everything is OK now.

For months, we’ve been calling on Facebook to open up user data in an appropriate way for the public at large to study.

It’s an invaluable bird’s eye view of the interactions between 350 million people around the world. There are probably a lot of social patterns of interaction between people that could be discovered in that data – some not pretty at all. For now, though, Facebook has analyzed the data in-house and given itself a cheery report card. More analysis appears to be forthcoming, so we’ll see what we’re told about what really goes on on Facebook – but that data ought to be made available for outside analysis.

In this case the data wasn’t anonymized; it was analyzed by two in-house staff members and two grad students from Cornell and Princeton. The group compared users’ last names on Facebook to U.S. Census data about the percentage of people with those last names who reported specific racial backgrounds.


Once a larger number of Facebook users have public profiles, something that’s probably happening very rapidly thanks to the radical new privacy settings the company began recommending to users last week, then analyzing things like names, friend lists and associations won’t constitute a violation of user privacy anymore.

That might not sound like something many users are comfortable with, but one way or another there is a lot of potential for social good (not just advertising) made possible by aggregate user data. Perhaps coincidentally, or perhaps not, the new privacy regime will remove the primary objections to bulk analysis of user data. Presumably something will need to be done to make the data available in bulk and in an appropriate format for outside analysis, though.

The example we’ve offered most commonly in calling for this data to be released is the history of what’s called real estate redlining. In the 1960s, when both U.S. Census information and real estate mortgage loan information were made available for bulk analysis, it was proven that banks around the U.S. were discriminating against home loan applicants in traditionally African American neighborhoods.

That was a big deal and we suspect that there are patterns of comparable importance, both positive and negative, hiding in Facebook’s huge store of data.

For contrast and illustration, consider the conclusions drawn by popular dating site OK Cupid in an analysis of dating inquiry response rates between its users of different races. In heterosexual pairs, male inquirers on OK Cupid were far more likely to get a response when they were white. Black, Hispanic and Asian men saw terrible response rates from women on the site. White men were least likely to respond to inquiries from black women and they were by far the most likely to say that they preferred to date people of their same race. Both white men and women were quite unusual in the likelihood of their saying they preferred to date people of their own race.

Take that, people who commented on the Facebook study tonight saying that people don’t see race any more! It certainly appears that we do.

It will be interesting to see if Facebook is willing to publish data that shines a less positive light on its own user base. Most likely, outside parties would be more apt to expose data like that.

The world could use some more self-awareness, Facebook, but it’s important that such self-awareness not be hand-delivered by scientists on your own staff, with your financial interests as their bottom line.