Hacker News new | past | comments | ask | show | jobs | submit login

Checkbot ignores robots.txt here because Google can still index pages even when they're included in robots.txt files.

If you don't want pages to be indexed, you should use a "noindex" tag instead of robots.txt. It's a common misconception that robots.txt stops indexing when it doesn't. See:

https://support.google.com/webmasters/answer/93710?hl=en

> Important! For the noindex directive to be effective, the page must not be blocked by a robots.txt file. If the page is blocked by a robots.txt file, the crawler will never see the noindex directive, and the page can still appear in search results, for example if other pages link to it.




Thanks, that makes perfect sense. And I'm definitely guilty of that misconception. This would definitely be last on a long list of things to clean up for my site.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: