One of our customers was paying a third party to hit our website with garbage tr...

sidewndr46 · 2025-08-25T22:58:37 1756162717

I guess my position it was comparatively well behaved? There were bots that would full speed blitz the website, for absolutely no reason. You just scraped this page 27 seconds ago, do you really need to check it for an update again? Also it hasn't had a new post in the past 3 years, is it really going to start being lively again?

xp84 · 2025-08-25T22:11:02 1756159862

> they were on links marked nofollow

if i'm understanding you correctly you had an indexable page that contained links with nofollow attribute on the <a> tags.

It's possible some other mechanism got those URLs into the crawler like a person visiting them? Nofollow on the link won't prevent the URL from being crawled or indexed. If you're returning a 404 for them, you ought to be able to use webmaster tools or whatever it's called now, to request removal.

hinkley · 2025-08-25T22:21:11 1756160471

The dumbest part is that we’d known about this for a long time and one day someone discovered we’d implemented a feature toggle to remove those URLs and then it just never got turned on, despite being announced that it had.

They were meant to be interactive URLs on search pages. Someone implemented them I think trying to allow A11y to work but the bots were slamming us. We also weren’t doing canonical URLs right in the destination page so they got searched again every scan cycle. So at least three dumb things were going on, but the sorts of mistakes that normal people could make.

dilyevsky · 2025-08-26T03:52:04 1756180324

> And if you 429 Google’s bots they will reduce your pagerank. That’s straight up extortion from a company that also sells cloud services.

Googlebot uses different IP space from gcp

hinkley · 2025-08-26T18:16:21 1756232181

They use the same bank accounts and stock ticker. This is basically a non sequitur.

The point is they’re getting paid to run cloud servers to keep their bots happy and not dropping your website to page six.

dilyevsky · 2025-08-26T19:04:31 1756235071

I thought the argument was that if you run on gcp you can masquerade as googlebot and not get a 429 which is obviously false. Instead it looks like the argument is more of a tinfoil hat variety.

btw you don't get dropped if you issue temporary 429s only when it's consistent and/or the site is broken. that is well documented. and wtf else are they supposed to do if you don't allow to crawl it and it goes stale?