In some ways, the term "big data" belies the challenges that startups face in tackling the subject. That adjective "big" tends to get a lot of the attention, often at the expense of the noun "data." In other words, we spend a lot of time talking about issues of the quantity of data and less time addressing issues of quality.
Some of those issues were addressed today at Web 2.0 Expo when Factual CEO and founder Gil Elbaz gave a talk on the challenges of big data. The subtitle of his talk is key here: "Getting Some." It isn't simply a matter of storing data, but rather how companies, particularly startups, can access data.
Elbaz identified several major hurdles that companies face around data:
- Rights and Ownership
- Economics and Business Models
- Integration and Aggregation
These are challenges for any company, arguably, but for startups, they can be particularly daunting. Elbaz gave the example of building a company based around book data as an example. There are a number of places where data around books can be found - Google Books, Amazon, LibraryThing, for example. But despite the amount of data about books - authors, descriptions, cover arts, reviews and the like - and despite a lot of these data sources having APIs, it's not easy for a startup to access, utilize, or monetize. And "starting from scratch" to build out a new database would take a lot of resources, something a startup isn't likely to have.
Elbaz argues that it's important to "grease the wheels" of data, something he sees as part of the mission of his startup Factual, an open data source for location data.
This open data model, he argues will move the web towards "information singularity," as will other efforts like data marketplaces, data search engines, semantic web mark-up, and better standards.
Elbaz contends that ownership and control of data will eventually be viewed as a cost and that companies will move towards common schemasand towards sharing foundational data. A new data economy may emerge, he says - an iTunes for data - with novel access methods so that startups can more easily build value-added services on top of big data, rather than having to worry themselves about gathering and storing the data themselves.