Last month, Condé Nast social news site Reddit asked users if they would donate their data for research purposes. This week the site made available a data dump from more than 40,000 people who opted-in to sharing what they do on the site. It’s a remarkable move than every social network could learn from.
Reddit’s goal for this data is to see it used to create a recommendation engine – in particular a system that would highlight some of the niche communities on Reddit that are a great place to find good topical content, but that too few people on the site have discovered. Now that the data is out in the wild, however, any number of analyses can be performed on it – and no one knows what kinds of observations about the relationship between people, web content, voting and news will be discovered. One little account preference opens up a world of opportunities: “allow my data to be used for research purposes.”
So far the number of users who have opted-in to donating their data remains relatively small (the site saw 400 million pageviews in July, for example) but it’s already enough to prove valuable.
“It’s great to have these kinds of data dumps available for research,” says Joel Spolsky, co-founder of the popular StackOverflow network, which makes its user data available in a bulk dump every month, under a Creative Commons license. “We’ve had several academics analyzing our data dump and learning interesting, measurable, scientifically relevant things about online communities. You never know what’s going to come out of it.”
“You never know what’s going to come out of it.” – Joel Spolsky on analysis of aggregate user data
Data savvy developers are sure to be interested in this kind of resource. “That looks awesome,” Tim Hastings of
, a service that does analysis of Twitter tag data, said to us about the Reddit data dump. “I especially like the goal of recomputing every two hours. Big data sets like this are great fun. You start out not knowing what you want to know, but you know there must be some wisdom buried deep.”
Chris Dixon, co-founder of recommendation service Hunch, said the Reddit data and recommendation effort are a “great project.” “I think I’ll have our devs hack something together using the Hunch API,” he said. “We have a blog recommender widget [of our own] coming out soon.” Dixon’s company is one of the most prominent startups aiming to build a “taste graph” and Hunch already offers recommendations that impress many people, on a wide variety of topics.
Real-world recommendations, profile analysis for increased self-awareness and scientific insights into the nature of online life: those are the kinds of things people are building now with publicly available social network user data.
Nowhere in the world is there more opportunity to develop such insights based on user data than on Facebook. Facebook used to hand over data dumps of its users activities to big companies doing research without communicating that to the users. Now, a much larger company, Facebook is maddeningly unwilling to offer bulk data export for research and analysis. Perhaps in part because people are so upset whenever the release of data is perceived as a violation of online privacy.
That’s an easy problem to solve, though, if Reddit is any indication. Just ask users if they want to check one box: “allow my data to be used for research purposes.”
Please, Facebook?