My name is Pete Warden, and a few months ago I created visualization based on crawling 210 million public Facebook profiles that raised a lot of questions about how openly available that information should be.
While I’ve seen a lot of discussion of the impact on users, I’ve seen little on why Facebook and other companies care so passionately about that data. If we want to understand what’s likely to happen to our information in the future, it’s important to understand why it’s such a crucial foundation for everything Facebook does and what threats it faces.
Facebook is able to offer a fantastic experience because it knows who your friends are. No other site has that knowledge, so it’s an incredible competitive advantage for the service – and despite all the privacy worries, makes it tough to quit because nowhere else can offer you those channels to reach your friends.
The real danger for Facebook is that others will get access to a broad and comprehensive social network and be able to offer the same rich social experience to new users. I see two ways this can happen:
- Someone copies the information from Facebook in bulk
- A similar network is created from an independent source using implicit data
Copying
There are two ways someone could copy user information from Facebook, either through crawling public profiles or using the API. I noticed while I was doing my crawling that it would only show eight friends at a time but it was a different set every time, which meant with enough visits to a user’s page I could gather a complete list of their connections. Facebook has fixed this hole so it only shows the same eight people now, but that’s still enough to build a partial but usable social network.
Using the API is a lot trickier because they have technical controls to throttle heavy-users who are downloading data too fast, but if you have a wide userbase on Facebook, like Zynga, then you could easily download information on tens of millions of users a week without noticeably increasing your API usage.
Facebook’s primary defense against both of these approaches is legal. They recently introduced a whitelisted robots.txt, so now legitimate Web crawlers have to agree to the same kind of terms of service that restrict what third parties can do with the data they gather through the API.
The weakness of this approach is that it relies on the honor system, since it’s extremely hard to track the flow and usage of data once it’s in another company’s hands. It’s possible to use fake accounts(Mountweazels) to act as markers to prove data came from Facebook originally, but all you need to do is cross-correlate the Facebook data with other public sources like phone books, electoral data or Twitter to weed out non-existent people.
As long as the data copiers take those sort of basic precautions it’s essentially untraceable – and Facebook will have a hard time proving they were the source in court. It’s also very hard to spot that a company is using data sourced from Facebook internally, unless they publicly announce it.
An Independent Network
Creating an independent source of social network data is a tough nut to crack. Facebook’s big advantage is that it’s a massive hassle for users to manually re-enter their social networks into yet another service, so the path of least resistance for website owners is to integrate with their existing repository. However there are alternative sources of information about our social networks that can be accessed without partnering with Facebook or requiring laborious user input: your email inbox and cellphone history.
I saw Buzz primarily as a bid by Google to stealthily build their own social network by leveraging those patterns of who you email most. It doesn’t seem to be making great progress, but the same idea is useful for any startup who needs to build a picture of its user’s social network.
You can now use OAuth to connect to Google and Yahoo inboxes without requiring a password, analyze the email headers to spot a user’s frequent recipients, and then use that information to help you offer a better service, for example by pre-populating your invite suggestions with a user’s inner circle rather than the using the entire contents of their address book.
Using phone calls and SMS patterns to understand social networks is a lot harder for third-parties, but you can bet that both the big telecom companies and the mobile software providers like Apple and Google are trying to figure out how to compete with Facebook using that data.
Facebook has an overwhelming advantage thanks to the network effects of having social information on so many users, but the very attractiveness of their position has to be focusing their competitors’ minds on how to replicate that strength.
Right now Facebook looks invincible, but it all rests on the unique social network they’ve gathered – they could be the next MySpace or Friendster if their rivals figure out an alternative.