The Web is huge. And growing. Faster everyday. It's almost like an ocean where there's no evaporation (the data on the Web stays there virtually forever), but yet, it's always raining in it. The rain is the new content that's added into the ocean.
Every tweet is a drop, every blog post is a drop, every check-in is a drop that falls into the ocean. This ocean is almost constantly under a tropical storm in some places, like Twitter or Facebook.
Guest author Julien Genestoux is the founder and CEO of Superfeedr, a company dedicated at making RSS and Atom feeds realtime. It has implemented PubSubHubbub from day one and now host several hubs, including ReadWriteWeb, Tumblr, Posterous and Gawker. Follow Julien on Twitter.
When you're a search engine, you obviously have an exhaustivity requirement. You can't really skip on indexing the Indian Ocean. Google sends its bo(a)ts all over the ocean where it's raining to update its index. However, the ocean is growing so fast that it will eventually become harder and harder to stay exhaustive.
Unfortunately, not only the ocean is growing, but it's also raining more, which means that if a bo(a)t is away from a zone for too long, when it will be back it will have changed tremendously. That's what happens when you see results in a search engine that are 1- or 2-years old, or even older. They're not wrong, they're just often inaccurate, but rank well.
It's a real technical problem for search engines to know where to send their bo(a)ts, and at the right time! And when Google says they're going to feed their search index with PubSubHubbub data, that's what they're trying to do: save a little bit on the boats.
I strongly disagree with John Battelle when he says this is not a huge deal. My take is that he sees this only as a great technical and infrastructure opportunity for Google, not so much as an immediate benefit for the end user. I strongly disagree - and so do you. You disagreed when you typed "earthquake" into Twitter Search, or even "hudson crash", or "Mickael Jackson". At that point, you knew that Google wasn't able to provide you with the information you were looking for, and this is a massive loss for Google.
Google will have a hard time getting this brain share back. The first thing it needs to do is to actually have results that date back from the minute when people look for these things.
You may argue that if you search 10 times a day on Google, you go maybe once a week to Twitter search. I'm the same, no worries. Yet, I know that Twitter is much better than Google at contextualization. When I do a search on Google, I expect to find the absolute truth. If I look for earthquake, I'm looking at facts about earthquakes: pictures or maybe historical data. If I look for earthquake on Twitter, I'm looking for context; I want what is being said about earthquakes now (and here!).
As a matter of facts, Google always had a lot of issues about context because they know so little about the people who search there (or maybe they know a lot, but don't want to scare us). Adding PubSubHubbub is a way for them to be able to take the "time dimension" back. They many never have the conversations that Twitter has, but they will have a much bigger ocean of data than Twitter's sea of Tweets
Photo by Pam Roth.