Twitter announced they have "launched a new backend for search on twitter.com." In short, they moved from the original Summarize technology they bought years ago to a infrastructure and system that is completely new, home grown.
Tedster at WebmasterWorld pulls out the key differences:
- Twitter's real-time search engine was, until very recently, based on the technology that Summize originally developed.
- [Now we have] a new, modern search architecture based on a highly efficient inverted index instead of a relational database.
- With over 1,000 TPS (Tweets/sec) and 12,000 QPS (queries/sec) = over 1 billion queries per day (!) we already put a very high load on our machines.
- We estimate that we're only using about 5% of the available backend resources, which means we have a lot of headroom. Our new indexer could also index roughly 50 times more Tweets per second than we currently get!
Regarding the 1 billion queries per day, they are not human searches. I strongly recommend you read Danny's piece on that.
Twitter said they chose Lucene, a search engine library written in Java, as a starting point. But not without modifications, things Twitter changed include significantly improved garbage collection performance, lock-free data structures and algorithms, posting lists, that are traversable in reverse order and efficient early query termination.
Forum discussion at WebmasterWorld.