In addition to the language indexing diversity, Gary Illyes from Google said in the Search Off the Record podcast that Google uses different indexing tiers. He said the search company "might use different kinds of storages to build the index." Some of the index goes on cheaper storage and some go on more expensive storage to be served and accessed faster.
If a document needs to be served often, Google might use one type of storage device over another. This is to balance cost and efficiency.
This part started at about 7:03 into the podcast.
Gary explained how computers are built to explain why Google uses different levels of storage types for its indexing tiers. Gary said:
If you think about it, when you are building your computer, for example, if you are an idiot like me and builds their own computer, then you will think a lot about the storage mechanisms that you put in your computer. First, you are going to have RAM, for example, R-A-M, random access memory, which is the most expensive kind of storage that you could possibly put in your computer. While maybe the L1 caches or L2 caches are more expensive, but you are not putting those in your computer. Those are integrated.But the first one that you can put in your computer, that's RAM. That's the most expensive kind of storage. They come in small capacities. And then after that, you have to choose between a hard drive, like a magnetic hard drive, or a solid state drive. The solid state drive is more expensive, but it's way faster. I don't remember the exact number, but it's orders of magnitude faster than a hard drive.
And that's because, for example, you don't have seek time on solid state drives. You can just go to a specific section right away at the speed of light quite literally and start reading from that section. While with a magnetic drive, like a hard drive, you actually have to move the arms of the hard drive to a specific section, to a specific disk, and start reading from the section where you believe that the data is.
He then explains based on "how many times we think that the document might be served, we might store the documents in our index in these different kinds of storage mechanisms." This is how Google defines its indexing tiers he said, "And that's what practically defines the index tiers that we have." "So for example, for documents that we know that might be surfaced every second, for example, they will end up on something super fast. And the super fast would be the RAM. Like part of our serving index is on RAM," Gary added.
He goes on a bit more "Then will have another tier, for example, for solid state drives because they are fast and not as expensive as RAM. But still not-- the block of the index wouldn't be on that. The bulk of the index would be on something that's cheap, accessible, easily replaceable, and doesn't break the bank."
It makes sense that Google would take this approach to storing information in its search index like this.
Now, you will ask, how does one optimize to be on the most expensive indexing tier? :)
Here is the embed so you can listen:
Forum discussion at Twitter.