Gary Illyes from Google said in the Search Off the Record podcast that the search company makes efforts to ensure it indexes a diverse set of languages. Gary said that Google is "paying quite a bit of attention into ensuring that every language has the same potential indexing-wise."
This was brought up by Cherry Prommawin, a Search Quality Analyst and Webmaster Relationships at Google in the Asia Pacific region. Cherry said "one specific problem that we do run into is that we don't have enough content in certain Southeast Asian languages, and in certain verticals."
Gary Illyes took it from there after John Mueller asked a question around that. Gary spent some time talking about "index selection." He said "we don't have a bias towards any one language. In fact, we are paying quite a bit of attention into ensuring that every language has the same potential indexing-wise. So basically, regardless if it's a tiny language like Hungarian, for example, or a big language like Chinese or English or Arabic, each has or will have enough resources to end up in our index. And when I say language, I do actually mean the set of docs that are written in that particular language."
Even for languages that are "not really meant to be written" he said, such as languages that maybe the community wants to preserve. He said for those languages it is "very hard to understand indexing-wise, and probably for us ranking those languages, those more special languages is also very hard." He added "we do try to index them, and store them in case someone figures out how to search for those languages."
This conversation started at about the 3:14 mark and went on for about 3 minutes. Here is the embed of the podcast from YouTube:
Forum discussion at Twitter.