Google and Microsoft announced partnerships with Twitter to begin indexing the Twitter firehose, but what about all the tweets from before then? Twitter still has them in boxes in the attic, but isn't offering them up to search at this point. Thankfully a smaller independent search engine, Topsy, has been indexing (most) tweets since May of 2008, and now you can dive much deeper into Twitter's history than Google, Bing or even Twitter will allow.Back in October of 2009 both
By choosing to focus its search efforts on "real-time" results from the last 4-7 days, Twitter has effectively placed the burden of operating a public archive of tweets on the shoulders of other search engines. At least for right now. With the difficulty the service has with maintaining uptime, its apparent apprehension to unlock its archives should come at no surprise.
Topsy, on the other hand, is unearthing Twitter's illusive history with its latest upgrade that allows users to search tweets at least as far back as May 2008. By comparison, Google's Twitter archives only go as far back as early February, 2010. Advanced search tools and the extra 21 months of coverage are a huge advantage for Topsy, but there are still 2 years of preceding tweets locked up on Twitter's servers.
News of Topsy's enhanced Twitter search functionality comes from Danny Sullivan at Search Engine Land, who recently took the service for a test drive. His results were so-so when attempting to use Topsy to locate Ashton Kutcher's first tweet, and my brief experience with it has been similarly mediocre.
Looking Back, Something's Missing
In November of 2008, Twitter exploded with information surrounding the Mumbai terrorist bombings. Using Topsy's advanced search, I searched for "Mumbai" within a 5-day range surrounding the attacks, but only about 28 results were found. By comparison, http://search.twitter.com/ turns up roughly that many tweets about Mumbai in just the last 20 minutes.
A search even further back in Topsy's Twitter history to the summer of 2008 during the California wildfires turns up a good amount of tweets, but only those with links. The company says that results from a time-specific search (especially those at the earliest dates of its archive, it seems) are just "highlights" of the tweets from that period.
Topsy's real strength as a search engine comes from its ability to derive relevancy from search results, and thus its Twitter efforts are more focused on these goals as well. Providing a full reverse chronology was not their primary goal, but the company says they are working on improving this side of search for a future release.
It appears that Topsy's Twitter archive has some improving to do before it can really be put to use for searching our 140-character past. It's a bit buggy right now, but a few feature tweaks and upgrades to the archive's data could make Topsy a valuable research asset in the near future. That is until Twitter rolls out its own solution, potentially destroying (or perhaps acquiring) Topsy's hopes and dreams.