I think a better solution would be to block all the traffic, but have a comment ...

Jach · 2025-07-17T15:15:40 1752765340

It's a nice option to have and maybe good in some cases. It reminds me of the nicety that some journalists do when requesting if they can use some video uploaded on social media for their show or piece. I do like the approach and shifting of first contact burden, as well as the general philosophical principle that blocking ought to be reversible and also temporary rather than permanent (though I also like the idea of exponential timeouts that can become effectively permanent). Still, I don't see myself ever doing anything like that. I'd still prefer to just not know about the bot at all, and if I did decide to perma-block them, unless the first contact comes with sufficient dollar signs attached I'm likely to ignore it entirely. I'm not usually in the mood for starting random negotiation with anybody.

I also tend to see the web from the "open web" dream perspective. By default no traffic is blocked. The burden of requesting is already inherently done with a client -- they request a route, and I serve it or not. For things like my blog I don't tend to care who is requesting a particular route -- even admin pages can be requested, they just don't get anything without being logged in. If someone is being "cute" requesting non-existent wordpress pages or what have you, searching for vulnerabilities, or have an annoying/ugly user agent string, or are just pounding me for no real reason, then I do start to care. (The "pounding" aspect is a bit trickier -- I look at steady state. Another comment mentioned cutting their db server's cpu load in half by dropping unlikely-to-be-real-users from two countries. For me, if that is merely a steady state reduction from like 10% of a machine to 5%, I don't really care, I start caring when it would get in the way of real growth without having to use more resources.)

When I was hosting on EC2, I used to have very mild anxiety that I'd piss off someone and they'd try to "harm" me by launching a botnet of requests at large media files and rack up bandwidth costs. (I believe it when some people say this has happened more organically with normal bots in the age of LLMs, but my concern was more targeted botnets/ddos.) There are a few ways to mitigate that anxiety: 1) setup monitoring, alerts, and triggers directly in code running on the instance itself or via overseeing AWS tools (I did the latter, which is less reliable, but still. There was a threshold to shutdown the whole instance, minimizing the total damage possible to something like under a couple hundred bucks, I forget the details of trying to calculate how much traffic could theoretically be served before the monitoring side noticed) 2) hide behind cloudflare and their unlimited bandwidth, as my content was mostly static (I didn't do that) 3) move/rearchitect to a free host like github pages, give up hosting my own comments (again didn't do) 4) move to OVH which has unlimited bandwidth (did this when Amazon wanted to start charging an absurd amount for just a single ipv4 address).

EPendragon · 2025-07-17T15:57:44 1752767864

I can see how it could lead to more overhead when communicating with the requesters. That could be a lot in the event that lots of them might want to crawl your resource.

I can see the argument that if I want to hide something, I should put it behind the layer of authentication. Robots is not a substitution for proper access control mechanisms. It is more of a "if they do honor this document, this would reduce the unnecessary traffic to my site" notion.

I appreciate you highlighting your personal experience in dealing with bots! I like the ideas of monitoring and being behind something like Cloudflare tools which would protect against the major influx of traffic. I think this is especially important for smaller sites which either use low or free tiers of cloud services.