Google's John Mueller said it is fine to publish some content in both PDF and HTML formats. He said both "can be shown independently in the search results." If Google sees them as duplicates, in some cases, Google may just show the HTML version in its search results.
He posted this Q&A in a AskGooglebot video.
The question was from Corina Burri, "Can my site publish content in both HTML and PDF?"
13 years ago John Mueller from Google said it can handle it when it comes to both PDF and HTML versions. In 2106, Gary Illyes said you don't need to worry about it and John said it's no problem.
Here is the video embed:
Here is the transcript:
Today's question comes from Karina who asks if it's okay to publish content twice once in HTML and once as a downloadable PDF file?
It's absolutely fine to do this.
In general Google systems can find both kinds of pages and index them separately even if the words in them are technically duplicates. They can be shown independently in the search results. You have controls to manage this if you want to.
For example, you could use a no index HTTP header or robots meta tag to block indexing of one or use the rel canonical link element to tell us about your preference.
In practice, often content is available in just one format or the other simply because that's what your audience wants. if you have a restaurant menu folks will want to look at it on their phones so a normal HTML page is best. On the other hand, if you have a specific form to fill out as a hard copy then using a PDF file can make sense. And some kinds of content might work well in both formats, such as a guidebook or case study available to review in paper form.
If our systems see these as duplicates they'll usually defer to the HTML page version.
It's worth mentioning when it comes to PDFs it's a good practice to include a link to your website in the PDF so that folks can find their way back.
Forum discussion at X.