Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I feel like I recall someone recently built a simple proof of work CAPTCHA for their personal git server. Would something like that help here?

Alternatively, a technique like Privacy Pass might be useful here. Basically, give any user a handful of tokens based on reputation / proof of work, and then make each endpoint require a token to access. This gives you a single endpoint to rate limit, and doesn’t require user accounts (although you could allow known-polite user accounts a higher rate limit on minting tokens).



Here we get to the original sin of packet networking.

The ARPANET was never meant to be commercial or private. All the protocols developed for it were meant to be subsidized by universities, the government or the military, with the names, phone numbers, and addresses of anyone sending a packet being public knowledge to anyone else in the network.

This made sense for the time since the IMPs used to send packets had less computing power than an addressable LED today.

Today the average 10 year old router has more computing power than was available in the whole world in 1970, but we've not made any push to move to protocols that incorporate price as a fundamental part of their design.

Worse is that I don't see anyway that we can implement this. So we're left with screeds by people who want information to be free, but get upset when they find out that someone has to pay for information.


You're likely thinking of Anubis from Techaro: https://github.com/TecharoHQ/anubis


Fresh versions of Firefox-ESR, Brave and Opera made it for the moment, however you need to find a way to allow them cookies while they persistently re-iterate, disabling our settings window. I find it absolutely unacceptable behavior to lock people out of their data - their property - without providing comprehensible reasons such as 'your browser is too old, only versions from xxx are supported'. And if this doesn't happen automatically, it should be communicated immediately in support requests. The way it is done, despite all good intentions, are malicious experiments with the users.


Anyone else unjustified blocked by TecharoHQ/anubis? Experience a silly 'Oh noes!' page and no help in the bug tracker ( 'we are new and experimental, pls. contact your sysadmin' ). Affected site: gitlab.gnome.org/GNOME .


Yes, thank you.


> This gives you a single endpoint to rate limit

Would you be rate-limiting by IP? Because the attacker is using (nearly) unique IPs for each request, so I don't see how that would help.

> someone recently built a simple proof of work CAPTCHA for their personal git server

As much as everyone hates CAPTCHAs nowadays, I think this could be helpful if it was IP-oblivious, random and the frequency was very low. E.g., once per 100 requests would be enough to hurt abusers making 10000 requests, but would cause minimal inconvenience to regular users.


I didn’t mention IP at all in my post. You can globally rate limit anonymous requests for everything (except maybe your login endpoint), if that’s the thing that makes sense for you.

The nice thing about the proof-of-work approach is that it can be backgrounded for users with normal browsers, just like link prefetching.


> You can globally rate limit anonymous requests for everything

Obviously you can do this, but that will start blocking everyone. How is that a solution?

Also what prevents attacker-controlled browsers from backgrounding the PoW too?

I took it as read that for something to qualify as a solution to this problem, it needs to affect regular users less badly than attackers.


> Obviously you can do this, but that will start blocking everyone. How is that a solution?

Two corrections: It will start rate-limiting (not blocking) anonymous users (not everyone). It's a mitigation. If specific logged-in users are causing problems, you can address them directly (rate limit them, ban them, charge them money, etc). If nonspecific anonymous users are causing problems, you can limit them as a group, and provide an escape hatch (logging in). If your goal is to "free access to everyone except people I don't like but I can't ask people if they are people who I don't like", well, I suppose it isn't a good mitigation for you, sorry.

> Also what prevents attacker-controlled browsers from backgrounding the PoW too?

The cost. A single user can complete the proof of work in a short period of time, costing them a few seconds of CPU cycles. Scaling this up to an industrial-scale scraping operation means that the "externalized costs" that the OP was talking about are duplicated as internalized costs in the form of useless work that must be done for you to accept their requests.


> If your goal is to "free access to everyone except people I don't like but I can't ask people if they are people who I don't like", well, I suppose it isn't a good mitigation for you, sorry.

Ah. Yes, the part in quotes here is what I think would count as a solution -- I've been assuming that simply steering anonymous users towards logging in would be the obvious thing to do otherwise, and that doing this is unacceptable for some reason. I was hoping that, despite attackers dispersing themselves across IP addresses, there would either be (a) some signal that nevertheless identifies them with reasonable probability (perhaps Referer headers, or their absence in deeply nested URL requests), or (b) some blanket policy that can be enforced which will hurt everyone a little but hurt attackers more (think chemotherapy).

> It will start rate-limiting (not blocking) anonymous users (not everyone).

If some entities (attackers) are making requests at 1000x the rate that others (legitimate users) are, the effect in practice of rate-limiting will be to block the latter nearly all the time.

> Scaling this up to an industrial-scale scraping operation

My understanding was that the PoW would be done in-browser, in which case this doesn't hold -- the attackers would simply use the multitudes of residential browsers they already control to do the PoW prior to making the requests, thus perfectly distributing that workload to other people's computers. What kind of PoW cannot be done in this way?


> My understanding was that the PoW would be done in-browser, in which case this doesn't hold -- the attackers would simply use the multitudes of residential browsers they already control to do the PoW prior to making the requests, thus perfectly distributing that workload to other people's computers. What kind of PoW cannot be done in this way?

I could be mistaken, but I don't think these residential VPN services are actual botnets. You can use the connection, but not the browser. In any case, you can scale the work factor as you want, making "unlikely" endpoints harder to access (e.g. git blame for an old commit might be 100x harder to prove than the main page of a repository). This doesn't make it impossible to scrape your website, it makes it more expensive to do so, which is what the OP was complaining about ("externalizing costs onto me").

All in all, it feels like there's something here to leverage proof of work as a way to maintain anonymous access while still limiting your exposure to excessive scrapers. It probably isn't a one-size-fits-all solution, but with some domain-specific knowledge it feels like it could be a useful tool to have in the new internet landscape.


> You can use the connection, but not the browser.

Fair enough, that would likely be the case if they're using "legitimate" residential IP providers, and in that case they would indeed need to pay for the PoW themselves somehow.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: