Is today’s news of major search engines’ integration of Twitter posts in search results the herald of a mass extinction or a mass acquisition?
According to tonight’s conversations with key players in the space, the day’s events and announcements could spell either or both. Every real-time search engine we spoke to has expressed every intention of weathering the storm on their current strategies, all of which center on providing an excellent UX though excellent product development. And all see the day’s events as a validation of years of concentrated effort. But who will prevail, and who will profit?
We spoke tonight with Tobias Peggs of OneRiot, Gerry Campbell of Collecta, and Bill York of Wowd. We’ve had in-depth conversations with each of these real-time search engines in the past, and we’re indebted to them for their insight.
Gut Reactions
Universally, these startups said that hearing today’s Google/Microsoft/Twitter news was a welcome validation of their years of perseverance in real-time search.
“It’s super exciting,” siad Peggs. “There’s been one way to search the web for 10 years, and we’re looking at a total revolution in the way that people find information. It’s a huge change in the industry. To see that feeling validated is awesome.”
According to York, “I don’t think we could ask for anything better than an endorsement from the major players. This is nothing but good for us. Back when I started, the marketplace was not very receptive to a new strategy.” York continued to say that he supported mainstream exploration of the real-time space, with the telltale caveat, “even if it means licensing someone else’s information and community.”
Campbell said this new information is something Collecta has built into their corporate strategy. “This is something we heard rumors on and had anticipated. It was fully expected. Having been involved with one of the giants [AltaVista] at one time, it’s quite obvious. It’s something we’ve anticipated and part of how we structured our company.”
Thoughts on Product Development
All the startups in real-time search have taken various approaches to the monumental task of indexing the real-time web. Collecta has employed the XMPP technology that powers IM clients in order to push streams of information. OneRiot has a fascinating algorithm that indexes tweet content, links in tweets, and the content of the linked-to pages to serve relevant results. And Wowd has developed a SETI@Home-like distributed computing model to effectively harness and parse the dataset created by users of real-time technologies.
Each company is proud of its hard-won advances and speculated on how Google and Microsoft will handle the data.
Campbell told us, “I can’t say Google will bring to real-time search. But it makes sense that any dataset will be part of their approach. This is the largest corpus of real-time data that has not been accessible. As a search practitioner, I think they’re going to keep on with their ranking approach.”
York added that nothing unforeseen has yet been announced. “The Twitter thing, that’s the kind of thing people have been expecting.”
But he also talked about the challenges of parsing real-time user-generated content. “I think the data stream is broader and shorter. There’s more and more real time, and you need different architecture to keep up with it. It’s important to have real filtering applied to a noisy, low-value data stream. We believe people are the key to finding the good stuff.”
“Knowing what goes into the product is quite eye-opening,” said Peggs. “There’s a tremendous lot of work to do once you’ve got tweets containing links, to process that information in real time and index the content on the page and render results based on content rather than just tweets. It’s relatively easy for someone to spam Twitter with irrelevant links; but you’ve got to follow the links and index the pages and search against the content of the pages, not just the 140-character tweets. You also have to link to results based on relevancy, not just based on retweets.”
Follow the Money
The opportunities for monetizing a new and powerful stream of Google- and Bing-driven traffic are both exciting and confounding for these startups, some of which have not yet put into play their own ideas for generating revenue.
As York noted, Google’s and Microsoft’s entry into real-time search represents a shift in the marketplace from these startups and their technology being a geek’s plaything to being a new way to direct user attention and serve powerfully relevant advertising.
“Google is in the enviable position of having a high profit margin in the search business itself,” he said. “It does fit their strategy to have as many eyeballs as possible, to get more people doing more stuff. We’re interested more in matching personal interest profiles.”
Also, as Peggs noted, “OneRiot has an API that allows anyone to incorporate our results. We also have a real-time ad model.”
Collecta also has rolled out two APIs, one for general search results from the real-time web and one for XMPP-powered streaming data. Campbell has also hinted that their monetization plans are innovative, but his team has not yet released specifics.
“Having been involved in this growth of paid search several times over,” said Campbell, “the creation of new technology creates new business opportunity. The monetization of search was a redefinition of online business models: You can advertise to users without being slimy. There’s now an opportunity to make users even happier without distracting them from the page.”
Strategy: Beyond “Get Acquired Or Die”
The startups in the real-time search space also universally expressed a commitment to current business strategies. Some seemed to have clearer exit goals than others, but all believe that their unique focuses on tech and product will allow them to survive the intrusion of Microsoft and Google into their arena.
Campbell, like many of his cohorts at other startups, noted that Twitter is a small segment of the available content sources for real-time web information. He also said, “Engines that are based solely on Twitter are probably more dead-on in terms of competition [with Google]. The less-funded companies are in a position where they have to do something more clever and unique.”
Collecta, he said, is still figuring out their role in the story. “We are a push search engine,” he told us. ‘That is increasingly our defining characteristic. The perception of speed is critical, but it’s not our most unique characteristic. Because we’re based on XMPP, the chat protocol, we’re pushing results as soon as possible.”
Said Peggs, “Our strategy doesn’t change. We’re focused on producing the most relevant web results based on not just Twitter, but also Digg and other services – a much wider pulse of the real-time web on the back end. And we continue to distribute those through our API.”
What’s to Come for Real-Time Search Startups
Every single startup we spoke to tonight expressed some trepidation about things to come.
“What happens to the bubble of startups in this space?” asked Campbell. “I hope they’ve had the foresight to see this through.”
“It doesn’t really change our strategy,” said a confident Peggs. “Two years ago, when you explained how this would change search, they looked at you like you were crazy.”
York’s assessment of Wowd’s place as an open-sourced approach to a problem now being tackled by major corporations was also optimistic. “When you’re a startup company competing with established players, there are always reasons to be cautious. We believe the approach we’re taking is a great way to go. It’s different, even than what you’ve heard today. We think this approach isn’t a gimmick; it is a fundamentally different approach.”
The bottom line, as in all verticals, is that once the major leagues take interest, some startups will sink and some will swim. Some will be acquired, and some will fail. Some may survive long enough to pose a legitimate challenge to the dominant players, but this circumstance is less likely.
Let us know your prognoses in the comments, and stay tuned for developing coverage of this space and these startups from ReadWriteWeb.