AI Model Fundamentally Cracks Captchas, Scientists Say

indubitable · on Oct 27, 2017

Seems like natural language processing would be an interesting direction for captchas.

- A man is running. A dog is behind him barking and growling. What does the man think might happen?

- A man goes up the stairs to the roof. He walks to the very edge of the building. He takes one more step. What is the man trying to do?

The correct answer should be pretty easy to parse out. And I'd expect a better success rate for humans than some of the captchas today that increasingly are looking more like magic eye puzzles than character recognition. But of course the big question is generation. Can these sort of implication based stories be generated in a way such that the final text can not trivially be reversed to the answer (without even considering the 'meaning' of the question)? And for that matter can these even be realistically generated in mass?

dmead · on Oct 27, 2017

You're in a desert walking along the sand when all of a sudden you look down and see a tortise. You reach down and flip the tortise on it's back. The tortise lays on it's back, it's belly baking in the hot sun but you're not helping. Why is that leon?

raverbashing · on Oct 27, 2017

* Tortoise

* its back

* its belly

* Leon

(in case the mistakes were not for comedic effect)

kobeya · on Oct 27, 2017

Found the simulant.

raverbashing · on Oct 27, 2017

;)

Or maybe later Android versions know they should turn the Tortoise back on its feet.

Joe-Z · on Oct 27, 2017

* Thank you

* Mr.

* Professor

(in case you want some appreciation for your nitpicking)

stavros · on Oct 27, 2017

Correcting people's mistakes is not nitpicking, it's how people learn.

vinchuco · on Oct 27, 2017

Correction. It can be.

Nitpicking: looking for small or unimportant errors or faults, especially in order to criticize unnecessarily.

See what I did there? We can keep going, but it adds no value. It's tortises all the way down.

stavros · on Oct 27, 2017

Indeed, and neither you nor the GP were criticising.

vinchuco · on Oct 28, 2017

fushta!

stavros · on Oct 30, 2017

Sylvania.

Joe-Z · on Oct 27, 2017

Oh, really? So how is one supposed to learn from just getting a list of corrections with no explanation whatsoever?

If you want someone to learn, help them. If you're not willing to do that leave it be. If you can't leave it be, you probably only want to nitpick.

oriel · on Oct 30, 2017

What's it like to hold the hand of someone you love?

sixhobbits · on Oct 27, 2017

People always come at this from an angle of "what can I do that computers can't?". You need to take into account the incredible diversity of people who use the internet, and what they can and can't do. There's already a viral article written by an old lady who can't pass the current captchas. Add to this, people who don't speak english, or don't speak it well; people who battle to read and comprehend text in any language; people who battle with logical reasoning; etc, etc, etc. The lowest common denominator for a task that be easily solved by any human is pretty low.

ddevault · on Oct 27, 2017

Just issue the Turing test to every visitor.

RobertoG · on Oct 27, 2017

It's a good idea, but you would need different languages versions.

belorn · on Oct 27, 2017

When writing such captcha questions for a forum, I generally use google as a validation to see that google can't answer the question in the top listed links. This allow me to easily adjust questions to the point where natural language processing should not be able to answer the question but a human person would.

Hyperbolic · on Oct 27, 2017

Yep, Question-Answering Semantic Role labeling is an interesting research project around crowdsourcing NLP datasets. https://dada.cs.washington.edu/qasrl/

olegkikin · on Oct 27, 2017

> A man is running. A dog is behind him barking and growling. What does the man think might happen?

So what's the correct answer here?

* That's a mean dog

* I hope it's on a leash

* I hope it's not going to start running after me

* I hope I don't get bitten

* Where can I hide?

* OMG, I will get rabies!

0xJRS · on Oct 27, 2017

This was my first thought, there are many different things a human could think of, and we will probably cycle through them all in a few ms. This would have to be multiple choice and then what? The "AI" would have a baseline 25% chance of getting it correct (assuming 4 options).

olegkikin · on Oct 27, 2017

Multiple choice doesn't work well for captchas. You can just guess and get it 25% of the time correct. Which is fine for spammers.

DalasNoin · on Oct 27, 2017

You couldn't autogenerate them easily. If you don't have a unique captcha people could store answers

taesis · on Oct 27, 2017

This [1] is the article they're citing. Note that a cursory search turns up similar claims from back in 2013; it might be worth waiting for someone with more experience and less bias to express their opinions before dumping your captcha-related stocks.

[1]: http://science.sciencemag.org/content/early/2017/10/26/scien...

thisisit · on Oct 27, 2017

> captcha-related stocks.

Are there companies relying only only selling captcha for revenues?

stepik777 · on Oct 27, 2017

There are captcha farms in Asia where hundreds of people are sitting in the building and solve captchas non-stop.

xkcd-sucks · on Oct 27, 2017

Maybe captchas could be implemented as opinion polls re: tiananmen square or something

taesis · on Oct 27, 2017

shrug I was just cracking wise, but maybe? I've had the stock market on my mind a lot recently, hence the bad joke.

sgustard · on Oct 27, 2017

Sure there are, I was an early investor in udbbGxls.

linkmotif · on Oct 27, 2017

You must be rtEepkQq now!

vonnik · on Oct 27, 2017

Is this the same old news from Vicarious? They announced this four years ago and raised about a $100M since then...

http://www.slate.com/blogs/future_tense/2013/10/28/captcha_c...

I thought the world moved on.

reilly3000 · on Oct 27, 2017

Since when was captcha not broken? Sites like http://www.deathbycaptcha.com/user/order have been around for ages. Yes, a mere $6.95 gets you 5000 captchas solved by OCR and humans in an avg of 6 seconds. Imagine that job.

Sure, AI can break captcha, but it can be done at scale for far less than an AI research and GPU rig costs.

Google's approach to bot recognition is training their own bots incidentally, so even an adversarial network attempting to bypass it would give it a ton of training along the way to breaking in.

notatoad · on Oct 27, 2017

>Imagine that job.

I don't believe it's a job. Isn't this the thing where captchas on target sites are simply mirrored on other sites like sketchy filehosts? Real human users are solving captchas to access some content hosted by this service, and the solution they enter is passed through to the target site.

rodorgas · on Oct 27, 2017

I believe it is a job. There are simply too much volume to be satisfied by the inconstant traffic on filehosts. Also, they have overload and lower response times on public holidays in India.

If they are using filehosts, how would they verify captcha is correct? They can double check but it will lower their capacity and solve times.

lsmod · on Oct 27, 2017

Actually, you can make money by solving captchas. https://2captcha.com/make-money-online

skgoa · on Oct 27, 2017

Yes, exactly this is the case.

exikyut · on Oct 27, 2017

Could you give me some pointers/hints (or even specific examples) where I'd find this in use?

I ask because IIRC I've never seen ReCaptcha be used on file hosting sites. (I don't think so, at least. There might be one...)

exikyut · on Oct 27, 2017

Okay, __wow__. I'm impressed. I thought they were being farmed out to people in 3rd world countries to solve.

nnfy · on Oct 27, 2017

As scummy as it is, you have to admire the ingenuity. That is a damn clever solution.

stepik777 · on Oct 27, 2017

That's also how Google self-driving cars work. When they don't understand what they are seeing they just show pictures to some random person in Google captcha who solves it quickly.

exikyut · on Oct 27, 2017

#1897

yorwba · on Oct 27, 2017

https://xkcd.com/1897/

So people can actually understand the reference.

_pdp_ · on Oct 27, 2017

The Google cloud vision API will do this for ~$3 but you will need to automate it yourself, which you might need to do anyway with other services.

partycoder · on Oct 27, 2017

Vicarious demoed cracking captchas at least 3 years ago.

Dileep George, cofounder of Vicarious, is the former Numenta CTO, and claimed to use probabilistic graphical models as a basis for their tech.

https://www.youtube.com/watch?v=-H185jPf-7o

habitue · on Oct 27, 2017

I don't see how captchas are "fundamentally cracked" if they only claim a success rate at best around 2/3rds. Nor do they give an explanation for what they mean by fundamentally cracked.

ComputerGuru · on Oct 27, 2017

Before you can say that a 66% success rate isn't good enough, you need to compare it to the human success rate. I barely get 2/3 myself.

a_imho · on Oct 27, 2017

This is my experience as well, very frustrating being locked out of your account when you need to take care of business. Nevermind cracked, they are fundamentally broken if a human can't get a nearly perfect success rate.

sobellian · on Oct 27, 2017

A captcha is cracked if it becomes economical to try to pass it over and over again. If you have a script that succeeds in spamming a forum 2/3 times it tries, you've got a successful spamming system.

What they mean by fundamentally cracked is that this method seems to be more robust against minor variations of spacing, font, etc. than CNN-based models.

willchang · on Oct 27, 2017

Captchas are useless for their intended purpose if a bot can get it right better than every other time.

taneq · on Oct 27, 2017

10 points to the first person to hack up a CAPTCHA using Winograd schemas.

Filligree · on Oct 27, 2017

Why not cut out the middleman? Make the captcha be controlling a paperclip manufacturing device.

stefanpie · on Oct 27, 2017

This comment is referring to this deceptively simplistic game: http://www.decisionproblem.com/paperclips/index2.html

Trust me this is nothing but simple and will take you on a wild ride.

DuskStar · on Oct 27, 2017

As good as that one is, and I do rather like the themes, it loses heavily in depth and complexity to Kittens Game. Universal Paperclips will take under a day to beat; Kittens Game will not fall so easily.

https://bloodrizer.ru/games/kittens/

taneq · on Oct 27, 2017

Oh, I thought it was in reference to the Paperclip Optimizer referenced every now and then on lesswrong.com :S

FeepingCreature · on Oct 27, 2017

(It is, and so is that game.)

taneq · on Oct 27, 2017

Oh, of course. Duh. :/

reacweb · on Oct 27, 2017

One of the fundamental problem with captchas is that writing a bot that defeat captchas is a very interesting exercise for teaching AI.

falcor84 · on Oct 27, 2017

I always understood this to be a feature. Captchas seem to be used as public signposts for tasks that AIs are not yet good enough at.

So, given the financial incentives on both sides, I'd like to believe that continually creating and overcoming these tasks is a possible route to AGI.

gruez · on Oct 27, 2017

Is there more to this than "text captchas can be broken by deep learning"?

ddevault · on Oct 27, 2017

What's the human pass rate for captchas? I bet I've personally failed at least 20% of the captchas I've solved in my lifetime.

trophycase · on Oct 27, 2017

At a certain point it will be impossible to create a working captcha. Are we basically engineering a Turing Test?

taneq · on Oct 27, 2017

CAPTCHA is literally meant to be an automated Turing test.

It's right there in the name: "Completely Automated Public Turing test to tell Computers and Humans Apart"

nabla9 · on Oct 27, 2017

CAPTCHA is Turing test with role reversal.

Computers try to figure out who is human and who is not. In Turing test humans try to who is human and who is not.

mitchty · on Oct 27, 2017

Some of the captchas I get lately have honestly made me think I'm probably not a human as far too often I can't make heads or tails of the letters.

At a certain point I just give up and refuse to use the worst sites that use this junk.

SomeStupidPoint · on Oct 27, 2017

I just assumed the test was that if you can decode that mess, clearly you're a bot.

_pdp_ · on Oct 27, 2017

Many types of CAPTCHA systems can be defeated with machine learning models and OCR. Google provides its own called Google Vision API. Here is a brief example how this is done in practice: https://blog.websecurify.com/2017/10/cracking-captchas.html

Perhaps this is an old news as this technique has been out for a while but I find that it is still relevant in the many cases I have encountered.

Furthermore, in my experience, I attribute Google's failure to improve reCAPTCHA's "I am not a robot" visual appeal as one of the key factors why many organisations are simply not using it.

briga · on Oct 27, 2017

I think rather than being broken, captcha models are just going to be made more complex. Maybe they'll start asking you to write a poem or play a mini problem solving game.

taeric · on Oct 27, 2017

I'd expect adversarial images to take off in captcha space. Don't try and avoid the models, exploit them.

lozenge · on Oct 27, 2017

Adversarial images need to be made with knowledge of the model.

yorwba · on Oct 27, 2017

Actually, no [1]. From the abstract:

We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

[1] https://arxiv.org/abs/1312.6199

EDIT: Also interesting: "Universal Adversarial Perturbations" https://arxiv.org/abs/1610.08401

wheresmyusern · on Oct 27, 2017

a lot of services use facebook to verify that someone is a human. there should be a service that exists only to manage peoples identities online. sign up, provide some id, an address and last four of your social. later, maybe a letter is sent to the address and returned with a verification code. then, every other service on the internet could use that service to prevent bots, spam and other things.

pinum · on Oct 27, 2017

I can foresee absolutely no potential problems with this plan...

wheresmyusern · on Oct 27, 2017

then what do you propose?

sitepodmatt · on Oct 27, 2017

Centralized identification services, thinking of some names, how about Experian or Equifax

wheresmyusern · on Oct 27, 2017

those companies gathered tons of information without even asking people first. and why would this service require the sensitive information to be stored at all, let alone in plain text?

ben_w · on Oct 27, 2017

Ask Equifax, they “didn’t need” to store as much as they did, let alone in plain text. Perhaps it’s useful for, oh, making sure two people aren’t signing up with the same information.

Other problems include:

* Social security numbers are explicitly not ID numbers

* Not all Americans have social security numbers

* Even if they did, Americans are about 5% of world population

* Not everyone on the internet gets post (say hi to Nairobi, where addresses of relatively rich locals may be “$Name, third on the left behind the petrol station on the highway”, while poorer people have homes that don’t officially exist on roads that don’t officially exist. Some of these people rely on mobile phones for payments, as banks don’t care.

* Not everyone has an ID certificate or card or passport or driving license to provide, and if they did, why not just skip the middle man and ask for that directly like AirBnB (and every international airline I’ve used) does? Or your bank account, like PayPal does — after all, a bank will certify your ID*

(* they probably don’t all, and even if they did, not all people have bank accounts)

Basically, ID is not a perfectly certifiable thing, so you need to design systems to accommodate failure — even if we actually solved it in theory, all it takes is one criminal finding one implementation flaw and a system that assumed perfection would allow grand exploitation.

wheresmyusern · on Oct 27, 2017

im talking about storing information. to be sure that someone is real, collect tons of info and then get rid of most of it and tag their account with a validation stamp. store just enough info so that two people having identical sets of that info is almost impossible. and im not saying ssn or any other peice of info is the best solution -- obviously it would be anything and everything tailored for whatever systems the persons country of residence currently maintains. as for exploitation, i think it would be simple to make exploitation rare as long as good id systems are implemented by the government or other entities. it would certainly be better than having an internet where its impossible to verify if someone is a human or a robot...

ben_w · on Oct 27, 2017

“Almost impossible“ isn’t good enough. “Almost impossible” is why two women in Floria have the same SSN: born the same day, same state, almost the same name (Joanna and Joannie Rivera), and why they were not even the example I had in mind when I googled for a similar story I’d heard a decade earlier.

And if your goal is just “is this a human or a machine?”, well, let me introduce you to the idea of identity theft, and why people stole all that data from Equifax.

The only way to tell humans and bots apart is some form of automated Turing test, hence Completely Automated Public Turing test to tell Computers and Humans Apart.

wheresmyusern · on Oct 27, 2017

in fact turing tests will be useless at some point, which is the whole basis of my speculation about alternatives to it as i originally stated. dont you agree that captchas will become obsolete at some point? what then?

ben_w · on Oct 27, 2017

If Turing tests become obsolete, we will have human level AI and the bots will be granted personhood. That’s the point of the test.

wheresmyusern · on Oct 28, 2017

captchas are turing tests. their purpose is not to grant personhood to robots. captcha will be totally broken well before machines become sentient.

ben_w · on Oct 29, 2017

Turing tests are, by definition, the thing which if a computer passes all of then they are a person.

kwhitefoot · on Oct 27, 2017

Here in Norway we use something like that for access to banks, tax, pensions, social services, etc. All of these services allow you to log in with what is called BankID. You apply for BankID and supply an ID like your passport then all the other banks and institutions accept that. It uses a two factor scheme with SMS, code cards, apps in SIM cards, etc.

But of course this isn't available to be used by some random kitten video trading site.

Also why would I want to give up my real identity to a lot of the sites that use a captcha?

cisanti · on Oct 27, 2017

I had a chat with my American co-worker why they don't use something more modern, and he seriously tried to explain me that requiring ID would be racist.

Now I knew the theory already, since I spend a lot of time online, but I asked him to explain it to other co workers, most of them very liberal as they say in U. S, they all thought he is joking.

Americans are funny sometimes :)

lsseckman · on Oct 27, 2017

Are you aware of the book The Circle?

wheresmyusern · on Oct 27, 2017

dude what does that even have to do with this? identity will need to be taken care of in the absence of capcha, or else we will have an internet that cannot differentiate between a bot and a real person. think of all the websites and all the broad categories of websites that would be broken by that. is this not a problem in need of a solution?