More

avastel · 2025-09-08T17:08:31 1757351311

I recently wrote about the limits of these kinds of fingerprinting tests. They tend to overly focus on uniqueness without taking into account stability. Moreover sample size is often really small which tends to artificially make a lot of users unique

https://blog.castle.io/what-browser-fingerprinting-tests-lik...

everdrive · 2025-09-08T17:31:15 1757352675

This is great, and exactly the kind of nuance I almost never see when this topics come up. Thanks for posting this. Far too often, the pro-privacy crowd is much more _upset_ than they are precise, and to the point of your article are spending extra effort without really accomplishing much.

avastel · 2025-08-28T14:03:02 1756389782

(Author here) TBH I don't think the main goal of Google was to make bots undetectable. It was more a side effect of preventing side effects while reviewing errors in the devtools: https://source.chromium.org/chromium/_/chromium/v8/v8/+/e08e...

It was the same when Google released the new headless Chrome a few years ago: https://antoinevastel.com/bot%20detection/2023/02/19/new-hea... It made vanilla/naive bots more realistic/detectable by default.

avastel · 2025-08-26T16:15:17 1756224917

Interesting article. I’ve been curious for a while about how residential proxy IPs are collected too. Many come from shady browser extensions or mobile apps, especially free VPNs (wink wink Hola VPN). People often don’t realize they are turning their device into an exit node.

Some time ago I started to track this as a side project (I work in bot detection and was always surprised by how many residential proxies show up in attacks). It started just out of curiosity. Now I collect proxy IPs, which provider they belong to, and how often they are seen. I also publish stats here: https://deviceandbrowserinfo.com/proxy-api/stats/proxy-db-30...

For example, in the last 30 days I saw more than 120K IPs from Comcast and nearly 100K from AT&T.

I also maintain an open IP (ranges) blocklist, mostly effective against data center and ISP proxies. Residential IPs are harder since they are often shared with legit users: https://github.com/antoinevastel/avastel-bot-ips-lists

Even if you can’t block all of them, tracking volume and reuse gives useful signal.

chatmasta · 2025-08-26T17:00:17 1756227617

Hola/Luminati rebranded as “Bright Data” and now pays mobile developers to embed their proxy SDK into mobile apps. Apple and Google should put a stop to this practice.

garbthetill · 2025-08-26T17:12:12 1756228332

they have been paying devs for a good bit now

garbthetill · 2025-08-26T17:16:06 1756228566

hola vpn is such an interesting case of a money printer, host a simple vpn and present it as free, give the users datacenter ips that are easy to detect. meanwhile you get their precious residential ip's and print millions a month

ignoramous · 2025-08-26T23:00:20 1756249220

The recent feud between founders is bound to reveal more interesting aspects of their business: https://www.haaretz.com/israel-news/tech-news/2021-07-01/ty-... / https://archive.vn/o5ujG

garbthetill · 2025-08-26T23:33:28 1756251208

Thanks for the great read, so much to unpack from that article the click fraud stuff is to be expected, keeping track of everything that goes through their proxy is also expected, but copying files is crazy and this could unravel to a class action

but with that being said, if you are doing something shady/grey area to get ahead you best give everyone a cut of the pie, especially your blood brother

arewethereyeta · 2025-08-26T19:14:25 1756235665

I would add that your chances of having a proxy node increase by 1% with each free app you install these days. We catch them easily at visitorquery.com but the residential proxy business in rampant and probably half are infected devices, android TVs, routers and, ofc, mobile apps.

antonvs · 2025-08-26T22:49:49 1756248589

> I work in bot detection and was always surprised by how many residential proxies show up in attacks

Why is that surprising? It seems like it'd be one of the major vectors.

avastel · 2025-06-17T07:05:20 1750143920

Author here: I work in bot detection, and wrote this post to explain why privacy-conscious users (VPNs, Brave, LibreWolf, etc.) often get flagged or blocked by anti-bot systems.

I’ve seen a lot of frustration in threads here, so I wanted to offer a technical perspective on why these false positives happen, and how detection systems interpret signals from non-mainstream setups.

avastel · 2025-06-04T07:11:01 1749021061

Author here: There’ve been a lot of HN threads lately about scraping, especially in the context of AI, and with them, a fair amount of confusion about what actually works to stop bots on high-profile websites.

This post uses TikTok’s obfuscated JavaScript VM (recently discussed on HN) as a case study to walk through what modern bot defenses look like in practice. It’s not spyware, it’s an anti-bot measure designed to make life harder for HTTP clients and non-browser automation.

Key points:

- HTTP-based bots skip JS, so TikTok hides detection logic inside a JavaScript VM interpreter

- The VM computes signals like webdriver checks and canvas-based fingerprints

- Obfuscating this logic in a custom VM makes it significantly harder to reimplement outside the browser (and so to scale an attack)

The goal isn’t to stop all bots, it’s to push attackers into full browser environments, where detection is more feasible

The post covers why simple solutions like "just require JS" don’t hold up, and why defenders use techniques like VM-based obfuscation to increase attacker cost and reduce replayability.

avastel · 2025-05-28T15:24:45 1748445885

Reposting a similar point I made recently about CAPTCHA and scalpers, but it’s even more relevant for scrapers.

PoW can help against basic scrapers or DDoS, but it won’t stop anyone serious. Last week I looked into a Binance CAPTCHA solver that didn’t use a browser at all, just a plain HTTP client. https://blog.castle.io/what-a-binance-captcha-solver-tells-u...

The attacker had fully reverse engineered the signal collection and solved-state flow, including obfuscated parts. They could forge all the expected telemetry.

This kind of setup is pretty standard in bot-heavy environments like ticketing or sneaker drops. Scrapers often do the same to cut costs. CAPTCHA and PoW mostly become signal collection protocols, if those signals aren’t tightly coupled to the actual runtime, they get spoofed.

And regarding PoW: if you try to make it slow enough to hurt bots, you also hurt users on low-end devices. Someone even ported PerimeterX’s PoW to CUDA to accelerate solving: https://github.com/re-jevi/PerimiterXCudaSolver/blob/main/po...

avastel · 2025-05-26T11:47:30 1748260050

Yeah, not (too) surprising after a few years in the anti-bot industry. Last week I looked into a Binance CAPTCHA solver that didn’t use a browser at all, just a basic HTTP client. The attacker had reverse engineered the entire signal collection and response flow, including how the CAPTCHA was marked as solved. They were able to forge the expected telemetry despite some obfuscation. https://blog.castle.io/what-a-binance-captcha-solver-tells-u...

This is pretty standard now in bot-heavy spaces like ticketing or sneaker drops. CAPTCHA often just ends up being a protocol to collect signals, and if those aren’t tightly bound to the browser/runtime, they get spoofed.

Also not surprised PoW isn’t holding up. Someone reverse engineered the PerimeterX PoW and converted it to CUDA to accelerate solving: https://github.com/re-jevi/PerimiterXCudaSolver/blob/main/po... At some point, it’s hard to make PoW slow enough for bots without also killing UX for humans on low-end devices.

avastel · 2025-05-20T07:32:34 1747726354

Author here. A few weeks ago, someone posted a link on Reddit to an open-source CAPTCHA solver made for Binance’s slider challenge. It’s written in Python and works without using a browser. Just a custom HTTP client, some image matching, and basic reverse engineering.

I was curious and decided to dig into it. I wrote a long breakdown of how it works, how it solves the challenge, and what this says about how bots are built today. Many bots use headless browsers, but this one doesn’t, and it still gets through.

One of the main takeaways is how effective this kind of non-browser approach can be when CAPTCHA is deployed in isolation, without other layers like continuous behavioral checks.

Happy to discuss or answer questions.

avastel · 2025-05-13T10:03:22 1747130602

Hi, author here. I wrote a blog post where I analyze Hidemium, a popular anti-detect browser. I break down the techniques it uses to spoof fingerprints and show how JavaScript feature inconsistencies can reveal its presence. Of course, JS feature detection isn’t a silver bullet, attackers can adapt. I also discuss the limitations of this approach and what it takes to build more reliable, environment-aware detection systems that work even against unfamiliar tools.

avastel · 2025-04-28T11:59:01 1745841541

I am working on a curated database of proxy IP addresses frequently used by bots: https://deviceandbrowserinfo.com/product/proxies-ips

So far I have ~ 3M distinct IP addresses per 30 days, with a lot of fresh proxy IPs, 1.7M. The DB contains only verified IP addresses through which I've been able to route traffic. It DOESN'T rely on 3rd party/open-source data sources.

I also made an open-source proxy IP block list based on the data: https://github.com/antoinevastel/avastel-bot-ips-lists

Havoc · 2025-04-28T12:15:32 1745842532

Wouldn’t this end up flagging a lot of residential IPs due to residential proxies?

avastel · 2025-04-28T12:25:29 1745843129

The DB contains different types of proxies: - Residential - ISP - Data center

I don't include mobile proxies since they're heavily shared, so knowing that an IP address was used as a proxy at some point is basically useless.

Regarding your remark, indeed, there are several shared residential IPs, including IPs of legitimate users who may have a shady app that routes traffic through their device. That's why I don't recommend blocking using IP addresses as is. It's supposed to be more of a datapoint/signal to enrich your anti-fraud/anti-bot system. However, regarding the block list, I analyze the IPs on bigger time frames, the percentage of IPs in the range that were used as proxies, and generate a confidence score to indicate whether or not it is safe to block.

Havoc · 2025-04-28T13:20:02 1745846402

Sounds like pretty sophisticated filtering!

I’m working on a scraping project at the moment so looking at this too but from the other end. Super low volume though so pretty tame - emphasis on success rate more than throughput

I bought a 4G dongle for use as last resort if nothing else gets through. And also investigating ipv6

avastel · 2025-04-28T14:27:00 1745850420

Using a 4G dongle makes it easier to hide in the crowd indeed. Since your traffic will go through heavily shared mobile IPs, probably with thousands of users behind them, anti-bot vendors won't/shouldn't block per IP, but per fingerprint/session cookie instead.

Havoc · 2025-04-28T15:49:25 1745855365

Ah hadn’t realised it’s the NAT. I thought it’s because the IPs are dynamic and rotate too much. Interesting.

Currently planning on doing a layered approach. Cloud IPs first etc.

Interesting challenge but also trying to be somewhat respectful about it since nobody likes aggressive bots