The other day, I covered how Google added a line to its Google documentation that Googlebot can crawl the first 15MB of content in an HTML file or supported text-based file, after that, it stops crawling. Then I was a bit shocked to see a large number of SEOs begin to panic.
For some reason, SEOs felt 15MB of raw HTML per page is not enough. 15MB is a massive amount of HTML on a URL by URL basis. It does not include downloading videos, images, etc, it is just the HTML source code. Again, it is a huge limit and none of this was new, it was simply just added to the documentation but has been in place at Google for a long time.
So Google's Gary Illyes did his thing to clarify and posted a nicely titled blog post on the Google blog named Googlebot and the 15 MB thing. In short, Gary explains "You, dear reader, are unlikely to be the owner of one, since the median size of a HTML file is about 500 times smaller: 30 kilobytes (kB). However, if you are the owner of an HTML page that's over 15 MB, perhaps you could at least move some inline scripts and CSS dust to external files, pretty please." He digs in more for those who are concerned, so go read it.
Then John Mueller of Google does his Twitter thread version:
This is not a new thing, it's just newly written down. If you haven't seen issues from this so far, you'll continue not to see them. While I trust that you can make HTML files that are larger, it's a *lot of work* and almost nobody does that.
— 🐝 johnmu.csv (personal) weighs more than 15MB 🐝 (@JohnMu) June 28, 2022
The Adventures of Sherlock Holmes by Arthur Conan Doyle, Frankenstein; Or, The Modern Prometheus by Mary Wollstonecraft Shelley, Moby Dick; Or, The Whale by Herman Melville, Dracula by Bram Stoker, Ulysses by James Joyce, also of course The Picture of Dorian Gray by Oscar Wilde,
— 🐝 johnmu.csv (personal) weighs more than 15MB 🐝 (@JohnMu) June 28, 2022
The Strange Case of Dr. Jekyll and Mr. Hyde by Robert Louis Stevenson, and on top (or bottom?) of all that:
— 🐝 johnmu.csv (personal) weighs more than 15MB 🐝 (@JohnMu) June 28, 2022
War and Peace by graf Leo Tolstoy.
Now, add the content you want to rank for. pic.twitter.com/2dP6otIV9I
I don't know about you, but to me, that's a lot of HTML. I could never get past the first few chapters of Pride & Prejudice anyway, and you want me to read all of this before getting to the actually important part? I admire Googlebot's patience.
— 🐝 johnmu.csv (personal) weighs more than 15MB 🐝 (@JohnMu) June 28, 2022
You can check the size of any page on the internet by going there, and looking at the developer tools in your browser. Or you can use a cool tool like https://t.co/CLRJkz732J which gives you the full size in a nice UI.
— 🐝 johnmu.csv (personal) weighs more than 15MB 🐝 (@JohnMu) June 28, 2022
If you're a prolific writer, my recommendation would be to split books into chapters, and publish them individually. Please don't publish 16 books on a single HTML page and expect people to find your best prose on the bottom. Thank you.
— 🐝 johnmu.csv (personal) weighs more than 15MB 🐝 (@JohnMu) June 28, 2022
Are you still concerned about this 15MB limit?
Forum discussion at Twitter.