Yahoo! Pipes and The Web As Database

Written by Alex Iskold and edited by Richard MacManus. In this post Alex tests out
and explores the emergent world of Yahoo! Pipes. He sees some interesting parallels with
Relational Databases in the 90’s, concluding that with pipes, the Web essentially becomes
a giant database that can be queried and remixed in any number of ways.

One of the central concepts
in Complex Systems is
Emergence. It is this automagical process through which elements of a
system give rise to a higher order system. Emergence is how physics becomes chemistry and
chemistry becomes biology. It is how web 1.0 evolved into web 2.0, and how that, in turn,
will become the next web.

While the exact mechanics of emergence is complicated and far from being completely
understood, scientists know that a new system emerges as a combination of its elements
and their interactions. In other words, complex systems are really networks – where
elements interact with each other and give rise to a new system.

Perhaps today we are witnessing one of the most vivid examples of emergence – the
remixing of the world wide web. The parts of the new web have crystallized – blogs,
photos, video, audio, maps, RSS, social network profiles and even plain old HTML pages
have formed an impressive network, that now can be mined and remixed. Mashups are really
nothing new, the web has been a programmable
oyster for at least a few years now.

What is new though is the recent systematic thinking about the web as a database. A
few companies, including Dapper, have been working on
the problem. But with the recent launch of Yahoo!
pipes, we are beginning to see the real power of remixing.

Ye Olde Relational Databases

The Web is just a vast database of information. Everyday, we interact with it without
thinking about that too much. We simply take our best query tool, usually called Google,
and fire away. Yet decades before the web made its way into our lives, a different kind
of database revolutionized our lives. The Relational Database qualifies
as one of our best computer science inventions. Lesser known to the non-techie crowd, it
nowadays quietly stores terabytes of information behind most familiar ecommerce and
corporate sites.

Microsoft Access Circa 1999

But Relational Databases are remarkably simple. They are collections of tables
(structured data) that can be joined (mixed) together via keys to produce a new set of
results. For example, the table of sales can be joined with the table of employees to
produce a report of who sold what. By combining the tables in various ways, programmers
are able to bring seemingly hidden information into the spotlight (think emergence). For
example, by combining the sales information with employee records and their geographical
locations, one can determine the best sales people in each country.

Another thing that Relational Databases are famous for is visual query and UI tools.
Because databases are so simple, and the data is well structured, people have created GUI
builders like Visual Basic or Power Builder to automate the UI for fetching and exploring
the data. We got so good and so perfect at mapping the databases to the UI, that it’s
become quite a boring thing to do since about 1997.

Well, now Yahoo! is making this whole business cool again, by changing the rules of
the game – the Web is now the new database.

Yahoo! Pipes – Applying Old Wisdom to the Web

Yahoo! Pipes Circa 2007

Yahoo! Pipes is a remarkable offering that was
announced last
week. It is the first GUI builder for the biggest database in the world, the Web
iself. When compared to Visual Basic and Power Builder, Yahoo! Pipes comes out as
more inventive and no less rigorous that its predecessors. It empowers developers to
remix the building blocks of the web in a whole new way. And it does it with remarkable
simplicity.

In Yahoo! Pipes, what used to be a table in the relational database is now: a web
page, an RSS feed, etc. The current list of sources includes: Yahoo! Search, Yahoo!
Local, Fetch (RSS feeds), Google Base and Flickr. Each source can be searched or queried
using either pre-defined or user-defined parameters. For example, there can be a search
of all french restaurants in Chicago via Yahoo! Local. The data source and the searches
can be mixed together (think emergence), using a reach set of operators. Among them is
the iterator (which lets the user loop through the results), a counter and many other
functions that facilitate cleaning, manipulating and recombining the information.

By bringing together many sources and operators, the user can build sophisticated
queries that fetch interesting, non-obvious information from the web. For example,
one can build a pipe that extracts the listings of all French restaurants in Chicago,
along with their Flickr photos. Since the underlying data is virtually limitless and the
set of operators is quite powerful, the number of interesting possible pipes is vast. And
for this reason, unlike its predecessor the Relational Database, Yahoo pipes will never
get boring.

Evolving Yahoo! Pipes

Yahoo! pipes are cool, but they have ways to evolve. The biggest issue is that, unlike
in Relational Databases, the data is neither structured nor clean. For example, how can
we ensure that Flickr pictures of restaurants in Chicago will be the right ones? We
really cannot. The same problem will exist in all pipes, simply because the underlying
data online is not as precise and polished as data usually is in a Relational Database.
What are the consequences of this? Well, users currently forgive some imprecision in tags
on Flickr and del.icio.us, yet they expect near perfect answers from Google. So having
precise instruments to clean the data in the pipes would go a long way.

Another, very different, axis for the evolution of the pipes is to make them usable by
a less technical crowd. As it stands right now, like Relational Databases, the pipes
require a techie brain to be used efficiently. Yet, it seems like there is a possibility,
particularly from the user interface and operator simplification point of view, to make
this tool usable by moms and pops. But even if not, judging again from the Relational
Database, getting wide adoption in the technical community would be just fine.

Conclusion

So what is the catch – why did Yahoo do it? The answer is the same old: search and
ads. The majority of the current data sources are from Yahoo! and so that means Yahoo!
will get the ad revenue when the pipes are run. So empowering thousands of enthusiastic
techies to remix the web using Yahoo’s data is a great idea.

Will this work? Will developers start using pipes? At the time of this writing there
are over 5,000 pipes, which is an impressive number given that the application is not
even a week old. But we should check in a month or so to see how things unfold. Certainly
the key to its success will be polishing the UI and adding new operators and data
sources. Since Yahoo! is known for its good design and focus on the user experience, it
is likely that we will see the pipes improving in that regard over time.

Please give the pipes a try if you have not done
so yet, and let us know what you think is going to happen to it over time.