Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> Solving the challenge–which is valid for one week once passed–

One thing that I've noticed recently with the Arch Wiki adding Anubis, is that this one week period doesn't magically fix user annoyances with Anubis. I use Temporary Containers for every tab, which means that I constantly get Anubis regenerating tokens, since the cookie gets deleted as soon as the tab is closed.

Perhaps this is my own problem, but given the state of tracking on the internet, I do not feel it is an extremely out-of-the-ordinary circumstance to avoid saving cookies.




I think it's absolutely your problem. You're ignoring all the cache lifetimes on assets.


OK, so what? Keeping persistent state on your machine shouldn't be mandatory for a comfortable everyday internet browsing experience.


What then do you suggest as a good middle ground between website publishers and website enjoyers? Doing a one-time challenge and storing the result seems like a really good compromise between all parties. But that's not good enough! So what is?


"In a fantasy land that doesn't exist, or maybe last existed decades ago, this wouldn't be needed." OK, that's nice. What does that have to do with reality as it stands today, though?


It's not a problem. You have configured your system to show up as a new visitor every time you visit a website. And you are getting expected behaviour.


It could be worse, the main alternative is something like Cloudflares death-by-a-thousand-CAPTCHAs when your browser settings or IP address put you on the wrong side of their bot detection heuristics. Anubis at least doesn't require any interaction to pass.

Unfortunately nobody has a good answer for how to deal with abusive users without catching well behaved but deliberately anonymous users in the crossfire, so it's just about finding the least bad solution for them.


I hated everyone who enabled the cloudflare validation thing on their website, because it was blocked for months (I got stuck on that captcha that was refusing my Firefox). Eventually they fixed it but it was really annoying.


The CF verification page still appears far too often in some geographic regions. It's such an irritant that I just close the tab and leave when I see it. It's so bad that seeing the Anubis page instead is actually a big relief! I consider the CF verification and its enablers as a shameless attack the open web - a solution nearly as bad as the problem it tries to solve.


Forget esoteric areas, I'm an average American guy who gets them running from a residential IP or cell IP. It even happens semi-frequently on my iPhone which is insane. I guess I must have "bot-like" behavior in my browsing, even from a cell.


I noticed that Google happily puts you on its shitlist as soon as you use any advanced parameters on your searches, such as “filetype:” or “inurl:” or “site:”.


This probably has something to do with it. I probably tend to move faster than average and am "bot-like" in that I sort of "scrape": search for something and quickly open all relevant tabs to review, page through them, search again. If while I'm going through I have something else I'd like to find, I'll fire up yet another tab and pop open all relevant tabs from that. Etc.


I am still unable to pass CF validation on my desktop (sent to infinite captcha loop hell). Nowadays I just don't bother with any website that uses it.


Too many sites that used to be good installed that shit. And weird part is that on desktop only Chromium fails to pass the captcha, no issues on Firefox. But Chromium is my main browser and sometimes I'm too lazy/uncomfortable opening 2nd browser for those sites.


I'd even argue that Anubis is universally superior in this domain.

A sufficiently advanced web scraper can build a statistical model of fingerprint payloads that are categorized by CF as legit and change their proxy on demand.

The only person who will end up blocked is the regular user.

There is also a huge market of proprietary anti-bot solvers, not to mention services that charge you per captcha-solution. Usually it's just someone who managed to crack the captcha and is generating the solutions automatically, since the response time is usually a few hundred milliseconds.

This is a problem with every commercial Anti-bot/captcha solution and not just CF, but also AWS WAF, Akamai, etc.


The pro gamer move is to use risk calculation as a means of determining when to throw a challenge, not when to deny access :)


> Unfortunately nobody has a good answer for how to deal with abusive users without catching well behaved but deliberately anonymous users in the crossfire...

Uhh, that's not right. There is a good answer, but no turnkey solution yet.

The answer is making each request cost a certain amount of something from the person, and increased load by that person comes with increased cost on that person.


Note that this is actually one of the things Anubis does. That's what the proof-of-work system is, it just operates across the full load rather than targeted to a specific user's load. But, to the GP's point, that's the best option while allowing anonymous users.

All the best,

-HG


I know that you mean a system that transfers money but you are also describing Anubis because PoW is literally to make accessing the site cost more and scale that cost proportional to the load.


> I know that you mean a system that transfer money ....

No, cost is used in the fullest abstract meaning of the word here.

Time cost, effort cost, monetary cost, work cost, so long as there is a functional limitation that prevents resource exhaustion that is the point.


If cost can be anything, does Anubis implement such a system then, by using proof-of-work as the cost function?


Sort of. Anubis is frontloading the cost all at once and then amortizing it over a large number of subsequent requests. That detail is what's causing the issue when browsing with additional privacy measures.



Can't see, that page is protected by anubis.


This makes discussions such as this have a negative ROI for an average commenter. Spamming scam and grift links still has a positive ROI, albeit a slightly smaller one.

I use a certain online forum which sometimes makes users wait 60 or 900 seconds before they can post. It has prevented me from making contributions multiple times.


I'm using one with a 5 in 14400 seconds timer right now. Ditto.


>It could be worse, the main alternative is something like Cloudflares death-by-a-thousand-CAPTCHAs when your browser settings or IP address put you on the wrong side of their bot detection heuristics.

Cloudflare's checkbox challenge is probably the better challenge systems. Other security systems are far worse, requiring either something to be solved, or a more annoying action (eg. holding a button for 5 seconds).


Checking a box is fine when it lets you through.

The problem is when cloudflare doesn't let you through.


Same problem with Google's captchas: solving them doesn't always mean you will be let in. That's outrageous, like isn't that the whole point?


No, the whole point is you are helping machine learning training. Doing work for free.


It really isn't. If they were purely focused on getting training data, they would give more captchas to everyone, not just the users with no google cookies, connecting from VPN, and with weird browser configurations. The fact of the matter is that all those attributes are more "suspicious" than average, and therefore they want to up the cost for getting past the captcha.


>The problem is when cloudflare doesn't let you through.

Don't use an unusual browser configuration then, like spoofing user-agents or whatever? If you're doing it for "privacy" reasons, it's likely counterproductive. The fact that cloudflare can detect it means that the spoofing isn't doing a very good job, and therefore you're making yourself more fingerprintable.


There's a whole lot of things that can count as "unusual" that aren't spoofing, and telling people not to be super vague "unusual" is a terrible solution.


>There's a whole lot of things that can count as "unusual" that aren't spoofing

Examples?


Ad block, other blocking, third party cookie restrictions, all the stuff firefox changes when you toggle resistFingerprinting. From your other comment "users with no google cookies" and "connecting from VPN".

Punishing people for not having Google cookies is probably the most obnoxious one.


Yeah. A “drag this puzzle piece” captcha style is also relatively easy, but things like reCaptcha or hCaptcha are just infuriating.

For pure POW (no fingerprinting), mCaptcha is a nice drop-in replacement you can self-host: https://mcaptcha.org/


Looks like mCaptcha is an login captcha, while cloudflare and anubis intercept any access including DDoS.


It's even worse if you block cookies outright. Every time I hit a new Anubis site I scream in my head because it just spins endlessly and stupidly until you enable cookies, without even a warning. Absolutely terrible user experience; I wouldn't put any version of this in front of a corporate / professional site.


Blocking cookies completely is just asking for a worse method of tracking sessions. It's fine for a site to be aware of visits. As someone who argues that sites should work without javascript, blocking all cookies strikes me as doing things wrong.


A huge proportion of sites (a) use cookies, (b) don't need cookies. You can easily use extensions to enable cookies for the sites that need them, while leaving others disabled. Obviously some sites are going to do shitty things to track you, but they'd probably be doing that anyway.

The issue I'm talking about is specifically how frustrating it is to hit yet another site that has switched to Anubis recently and having to enable cookies for it.


Hi. Developer of Anubis here. How am I meant to store state in the client without cookies if JavaScript is also disabled? Genuinely curious.


The issue isn't primarily about JS being disabled, because at least if you have it disabled, it's (a) obvious you're hitting an Anubis page, (b) you don't endlessly refresh the page over and over every 0.25 seconds until you fix it.

What you ought to do is warn the user. It's easy enough to detect server-side if cookies are disabled, because if you set one it ought to be sent on any subsequent requests. If requests after the initial site hit don't have the cookie, it clearly failed to set and/or send, so instead of refreshing the page over and over you should display an error.

This isn't a problem exclusively with Anubis, there are some other sites that will endlessly refresh if you don't have cookies enabled, but it's really poor practice to not handle error conditions in your application.


The next best alternative to a basic session cookie isn't doing shitty things, it's either using your IP and praying that doesn't break, or putting the session token into each link.

There's no real way to hide that you're visiting the site and clicking multiple pages during that visit, so I don't see what's so bad about accepting a first party cookie for an hour.


You would prefer the cookie embedded in url?


I would prefer web developers not track me at all without a good reason and consent. Yes, I also block JS on a per-site basis, use an ad / tracking blocker, and block all third party cookies entirely.

I'm not naive - I know that it is possible to track me using other server-side tools even with all this effort, but on the other hand I'm easily in the 0.1% most difficult users to track, which means a lot of web devs are going to use the easy approaches that work for 99% of users and leave me alone. That's a worthwhile trade to make, for me.


FWIW with these systems you can proactively ask a new challenge as often as you want or even use many tokens simultaneously.


I will take Anubis any day over its alternative - the cloudflare verification page. I just close the tab as soon as I see it.


Browsers that have cookies and/or JS disabled have been getting broken experiences for well over a decade, it's hard to take this criticism seriously when professional sites are the most likely to break in this situation.


If you want to browse the web without cookies (and without JS in an usable manner) you may try FixProxy[1]. It has a direct support for Anubis in the development version.

[1]: https://www.fixbrowser.org/blog/fixproxy


For me the biggest issue with archwiki adding Anubis is that it doesn't let me in when I open it on mobile. I am using Cromite: it doesn't support extensions, but has some ABP integrated in.


I too use Temporary Containers, and my solution is to use a named container and associate that site with the container.


I am low-key shocked that this has become a thing on Arch Wiki, of all places. And that's just to access the main page, not even for any searches. Arch Wiki is the place where you often go when your system is completely broken, sometimes to the extent that some clever proof of work system that relies on JS and whatever will fail. I'm sure they didn't decide this lightly, but come on.


> One thing that I've noticed recently with the Arch Wiki adding Anubis

Is that why it now shows that annoying slow to load prompt before giving me the content I searched for?


Would you like to propose an alternative solution that meets their needs and on their budget?


Anubis has a 'slow' and a 'fast' mode [1], with fast mode selected by default. It used to be so fast that I rarely used to get time to read anything on the page. I don't know why it's slower now - it could be that they're using the slower algorithm, or else the algorithm itself may have become slower. Either way, it shouldn't be too hard to modify it with a different algorithm or make the required work a parameter. This of course has the disadvantage of making it easier for the scrapers to get through.

[1] https://anubis.techaro.lol/docs/admin/algorithm-selection


The DIFFICULTY environment variable already allows for configuring how many iterations the program will run (in powers of 10).

The fast/slow selection still applies, but if you put up the difficulty, even the fast version will take some time.


a static cache for anyone not logged in, and only doing this check when you are authenticated which gives access to editing pages?

edit: Because HN is throwing "you're posting too fast" errors again:

> That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.

Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.


> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week. It's not going to stop them.

The goal of Anubis isn't to stop them from scraping entirely, but rather to slow down aggressive scraping (e.g. sites with lots of pages being scraped every 6 hours[1]) so that the scraping doesn't impact the backend nearly as much

[1] https://pod.geraspora.de/posts/17342163, which was linked as an example in the original blog post describing the motivation for anubis[2]

[2]: https://xeiaso.net/blog/2025/anubis/


The point of a static cache is that your backend isn't impacted at all.


That falls short of the "meets their needs" test. Authenticated users already have a check (i.e., the auth process). Anubis is to stop/limit bots from reading content.


... Are you saying a bot couldn't authenticate?

Still need a layer there, could also have been a manual login to pull a session token.


> Arch Wiki is a high value target for scraping so they'll just solve the anubis challenge once a week.

ISTR that Anubis allows the site-owner to control the expiry on the check; if you're still getting hit by bots, turn the check to 5s with a lower "work" effort so that every request will take (say) 2s, and only last for 5s.

(Still might not help though, because that optimises for bots at the expense of humans - a human will only do maybe one actual request every 30 - 200 seconds, while a bot could do a lot in 5s).


Rather than a time to live you probably want a number of requests to live. Decrement a counter associated with the token at every request until it expires.

An obvious followup is to decrement it by a larger amount if requests are made at a higher frequency.


Does anyone know if static caches work? No one seems to have replied to that point. It seems like a simple and user-friendly solution.


Caches would only work if the bots were hitting routes that any human had ever hit before.


They'd also work if the bot, or another bot, hits that route before. It's a wiki, the amount of content is finite and each route getting hit once isn't a problem.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: