Disallowing Disallow: How to properly hide pages from Google bots
My earliest blog posts date back to 2004. There’s very little in them which is relevant today, and they’re full of dead links, which, I’m told, is bad for my site’s SEO. So recently, I’ve done two things to hide these posts from search engines.
First, I used a robots meta tag to request that search engine robots don’t index or follow the archived pages:
<meta name="robots" content="noindex, nofollow">
<meta name="googlebot" content="noindex, nofollow">
Then, I added a line to my Robots.txt file to stop search engines from reading the folder containing those archives:
User-agent: *
Disallow: /archives/
Sitemap: https://stuffandnonsense.co.uk/sitemap.xml
Imagine my confusion when a Google search turned up one of my archived pages with a message that reads, “No information is available for this page. Learn why.”
Google Search Console says that “If a Google search result says that no information is available for a page it means that the website prevented Google from creating a page description, but didn’t actually hide the page from Google.”
What? But I’d done both of those things. noindex
in the robots meta tag should prevent Google from creating a page description and stop it from indexing the page. And isn’t the Robots.txt file supposed to hide the page from Google? Yet the pages do appear on Google without a page description.
Turns out, the Robots.txt file was preventing search engines from crawling the archives folder, which then meant that Google couldn’t then read the robots meta tags and know to noindex
the pages. Damn.
Google Search Console actually does explain this:
You can prevent the page from appearing entirely in Google Search results by following these steps:
Use noindex on your page. If using noindex, you must also remove the robots.txt rule that blocks the page to search engines. Sounds strange, but Google needs to be able to read the page in order to see your noindex instruction.
So, I’ve now removed the Disallow: /archives/
line from my Robots.txt file and I’ll see what difference that makes.
I’ve also made a new index page for these archived posts as there’s actually some decent and funny stuff buried deep down in there.