“My background is in Artificial Intelligence and my last business was building predictive data. Most of our customers were oil companies, and you can hold that against me if you like. But my pitch back then was ‘just give me enough data, I’ll figure out something.’ And often enough I did figure out something.”
That’s how Houston-based 80Legs CEO Shion Deysarkar describes his background. Tonight his Web-crawling-as-a-service company will put up for sale tens of millions of data points extracted from public social networks and other websites. He says it’s only a matter of time until everyone’s doing it and he wants to be one of the good guys. “You can figure something out from just about anything,” he says. That’s the kind of geek Shion Deysarkar is.
Starting at $350 per month, 80Legs customers can now purchase 10 to 20 million monthly user profiles from LinkedIn, MySpace and some other social networks. Facebook and Twitter are not included, but there are a variety of other data sets from places like retail websites available as well.
I’ve bet Deysarkar a beer that LinkedIn isn’t going to put up with this, but he says 80Legs has been crawling them extensively for quite a while and would have stopped them if they wanted to. We’ll see.
80Legs launched at DEMO last fall and has been on our radar since last spring. Its core product is crawling the Web for a small fee – to index whatever its customers want. As Sarah Perez wrote in September:
What 80Legs does is no easy feat. It provides its users a service which offers up 50,000 computers which can crawl up to 2 billion web pages per day. Yes, it’s like having your own little search engine that you can rent for a small fee. How small? 80Legs is about 50% less expensive than any other competitive service out there.
Tonight it’s putting up for sale some pre-configured crawls, in hopes to reach a new market of people for whom the core service is too complicated.
Either way, Shion Deysarkar may be a man from the future. We’re watching closely the slow opening of aggregate social network user data for bulk analaysis and innovation. It’s a hotly contested area. Here’s what Deysarkar thinks about four of the biggest questions in this area today.
On The Slap-Down of Nice Facebook Data Harvesters
Academic and innovation-minded researchers are harvesting large quantities of public Facebook user profile data, only to be threatened by Facebook’s legal department. Pete Warden is the best known example, and one that Deysarkar called “a shame.”
The people using that data are not doing anything that’s shady or wrong. They are trying to make new value on top of that data. In ways that Facebook or whoever is not doing. Facebook is in the business of bringing people to their site, they aren’t leveraging that data for other things, and there is many things they’ll never use data for. No harm is being done to Facebook. What would help them would be to become a data standard. As long as people are adding value then it’s good.
On Users Approving of Data Aggregation
Say “aggregate user data analysis” and most people freak out – presuming it’s a screaming privacy violation. Might that ever change? Deysarkar thinks so, perhaps too optimistically.
“Going forward, the end user will hopefully understand that people are creating services that will benefit them. If I take a couple of actions and I see it benefits me that’s hopeful. The challenge is that people have to understand that it came from aggregation. The more people that are making a case and building things around it, the better.
“If you look at social networking, quite often connections are made in unintuitive ways. Obviously market researchers can take advantage of that, but it can also help people connect with that we couldn’t otherwise.
“At the end of the day, it’s going to happen. Sites are going to fight it, but that data is going to become available. Wherever there is value to be had, people are going to go for that value.”
One of our arguements has been that Facebook and other networks should open up access to their public user data for aggregate analysis because the bad guys who want to do bad things with it already are, through the black market. Meanwhile, positive uses of data analysis are prohibited. Deysarkar confirms again that the black market is real.
“Companies should want to work with us because we’re above board. The black market definitely exists. We have heard about it from some of our potential customers, who have asked about things we wouldn’t do. They just say, ‘we can get it through other ways.’ Things like wanting a crawler to log-in and get private data. It’s too bad that exists.”
On the Still-Infant Market for Good User Data
80Legs is cool. It’s a crawler-as-a-service. Pete Warden, one of our Big Data favorites, uses and endorses it. But it’s also a little complicated, especially because it’s like selling potential. It sells data that you then have to derive value from; it doesn’t deliver value directly in ways people are familiar with. The Economist’s Special Report on Big Data last month argued that data was a key new form of economic input, on par with land, labor and capital. Deysarkar says he agrees with that, “it is definitely a unit of value,” but also admits that too few people get it yet.
“We do have customers who are using 80legs the way we intended, we have a decent set of customers. But we know that there is a whole other set of customers who are intimidated because it is a bit technical now. These pre-configured crawls we’re now selling still fit into the big picture, but the whole data market is not well defined. There isn’t a rich enough ecosystem of companies using the data, that’s the market we’d like to serve, but it’s still being formed right now.”
What do you think? Is 80Legs just a little ahead of its time? A lot? Totally crazy and wrong? We would love for you to share your thoughts on these matters in comments below.