I didn’t mention IP at all in my post. You can globally rate limit anonymous requests for everything (except maybe your login endpoint), if that’s the thing that makes sense for you.
The nice thing about the proof-of-work approach is that it can be backgrounded for users with normal browsers, just like link prefetching.
> Obviously you can do this, but that will start blocking everyone. How is that a solution?
Two corrections: It will start rate-limiting (not blocking) anonymous users (not everyone). It's a mitigation. If specific logged-in users are causing problems, you can address them directly (rate limit them, ban them, charge them money, etc). If nonspecific anonymous users are causing problems, you can limit them as a group, and provide an escape hatch (logging in). If your goal is to "free access to everyone except people I don't like but I can't ask people if they are people who I don't like", well, I suppose it isn't a good mitigation for you, sorry.
> Also what prevents attacker-controlled browsers from backgrounding the PoW too?
The cost. A single user can complete the proof of work in a short period of time, costing them a few seconds of CPU cycles. Scaling this up to an industrial-scale scraping operation means that the "externalized costs" that the OP was talking about are duplicated as internalized costs in the form of useless work that must be done for you to accept their requests.
> If your goal is to "free access to everyone except people I don't like but I can't ask people if they are people who I don't like", well, I suppose it isn't a good mitigation for you, sorry.
Ah. Yes, the part in quotes here is what I think would count as a solution -- I've been assuming that simply steering anonymous users towards logging in would be the obvious thing to do otherwise, and that doing this is unacceptable for some reason. I was hoping that, despite attackers dispersing themselves across IP addresses, there would either be (a) some signal that nevertheless identifies them with reasonable probability (perhaps Referer headers, or their absence in deeply nested URL requests), or (b) some blanket policy that can be enforced which will hurt everyone a little but hurt attackers more (think chemotherapy).
> It will start rate-limiting (not blocking) anonymous users (not everyone).
If some entities (attackers) are making requests at 1000x the rate that others (legitimate users) are, the effect in practice of rate-limiting will be to block the latter nearly all the time.
> Scaling this up to an industrial-scale scraping operation
My understanding was that the PoW would be done in-browser, in which case this doesn't hold -- the attackers would simply use the multitudes of residential browsers they already control to do the PoW prior to making the requests, thus perfectly distributing that workload to other people's computers. What kind of PoW cannot be done in this way?
> My understanding was that the PoW would be done in-browser, in which case this doesn't hold -- the attackers would simply use the multitudes of residential browsers they already control to do the PoW prior to making the requests, thus perfectly distributing that workload to other people's computers. What kind of PoW cannot be done in this way?
I could be mistaken, but I don't think these residential VPN services are actual botnets. You can use the connection, but not the browser. In any case, you can scale the work factor as you want, making "unlikely" endpoints harder to access (e.g. git blame for an old commit might be 100x harder to prove than the main page of a repository). This doesn't make it impossible to scrape your website, it makes it more expensive to do so, which is what the OP was complaining about ("externalizing costs onto me").
All in all, it feels like there's something here to leverage proof of work as a way to maintain anonymous access while still limiting your exposure to excessive scrapers. It probably isn't a one-size-fits-all solution, but with some domain-specific knowledge it feels like it could be a useful tool to have in the new internet landscape.
> You can use the connection, but not the browser.
Fair enough, that would likely be the case if they're using "legitimate" residential IP providers, and in that case they would indeed need to pay for the PoW themselves somehow.
The nice thing about the proof-of-work approach is that it can be backgrounded for users with normal browsers, just like link prefetching.