As part of its recent UI redesign, Twitter has also made some significant changes to its backend, and today Michael Busch updated the Twitter Engineering Blog with some details about how Twitter has revised search.
Initially Twitter’s real-time search engine was based on the technology of Summize, a company Twitter acquired in 2008. But since then, Twitter has seen phenomenal growth: over 1,000 Tweets per second and 12,000 queries per second, making well over 1 billion queries per day. And the Twitter Engineering Team has been seeking some alternatives as “scaling the old MySQL-based system had become increasingly challenging.”
So Twitter has moved to a new search architecture, choosing to adopt the open source Lucene.
Despite Lucene’s strengths, it does have shortcomings in terms of real-time search. And so Twitter has rewritten parts of its architecture, while still supporting Lucene’s APIs. These changes include:
- significantly improved garbage collection performance
- lock-free data structures and algorithms
- posting lists, that are traversable in reverse order
- efficient early query termination
This new search architecture is faster and more scalable, and uses only about 5% of the available backend resources, moving towards the engineering team’s goal of building search “to support at least an order of magnitude more load.”
For more information on how Twitter handles other big data challenges, check out the slides from Twitter engineer Kevin Weil’s talk at Web 2.0 last month: