How hard would it be to create an alternative using GPT-2 or the like?
Create a dozen models based on different things. Street signs, cats, houses, cars, etc. Then show the user a random selection of images generated from different models and say "select all the cats" and they get it right if they choose the images generated from the cat model.
So the short version is that they try to fingerprint the user and then distinguish fingerprints that seem like humans from fingerprints that don't.
The interesting question then becomes how this is going to interact with future browser anti-fingerprinting measures whose purpose is to prevent just that.
Create a dozen models based on different things. Street signs, cats, houses, cars, etc. Then show the user a random selection of images generated from different models and say "select all the cats" and they get it right if they choose the images generated from the cat model.