Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did this person just... solve bot abuse? That should be the title of the post. I for sure want to use their solution for my own homeserver.


Related discussion on Anubis: https://news.ycombinator.com/item?id=43427679


No. If that interstitial is working, it's only working due to obscurity, and the moment this system becomes even slightly popular it'll become worthless.

Proof of work is not a viable defense -- it's basically impossible to tune the parameters such that the cost is prohibitive or even meaningful to the scrapers but doesn't become an obstacle to users.

It's pretty much just a check for whether the client can run JavaScript. But that's table stakes to a scraper. Trying to discriminate between a real browser, a real browser running in headless mode, or something trying to fake being a real browser requires far more invasive probing of the browser properties (pretty much indistinguishable from browser fingerprinting) and obfuscating what properties are being collected and checked.

That's already what any commercial bot protection product would be doing. Replicating that kind of product as an on-prem open source project would be challenging.

First, this is an adversarial abuse problem. There is actual value in keeping things hidden, which an open source project can't do. Doing bot detection is already hard enough when you can keep your signals and heuristics secret, doing it in the open would be really hard mode. (And no, "security by obscurity is no security at all" doesn't apply here. If you think it does, it just means you haven't actually worked on adversarial engineering problems.)

Second, it's an endless cat and mouse game. There's no product that's done. There's only a product that's good enough right now, but as the attackers adapt it'll very quickly become worthless. For a product like this to be useful it needs constant work. It's one thing to do that work when you're being paid for it, it's totally another for it to be uncompensated open source work. It'd just chew through volunteers like nobody's business.

Third, you'll very quickly find yourself working only in the gray area of bots that are almost but not quite indistinguishable from humans. When working in that gray area, you need access to fresh data about both bot and real user activities, and you need the ability to run and evaluate a lot of experiments. Not a good fit for on-prem open source.


From what I gather the idea for Anubis isn't to _stop_ bots, it's to make them slow down enough to not bring down servers.

Like they said in the presentation, git(lab/tea) instances have insane amounts of links on every page and the AI crawlers just blindly click everything in nanoseconds, causing massive loads for servers where normally there might be a maybe a few thousand git pulls/pushes a day and a few hundred people clicking on the links at a human pace.

Plus the bots are made to be cheap, fast and uncaring. They'll happily re-fetch 10 year old repositories with zero changes multiple times a week, just to see if they might've changed.

Even a if the bad proof of work requires the bots to slow down their click rate, it's enough. If they somehow manage to bypass it completely, then that's a problem.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: