Seems like natural language processing would be an interesting direction for captchas.
- A man is running. A dog is behind him barking and growling. What does the man think might happen?
- A man goes up the stairs to the roof. He walks to the very edge of the building. He takes one more step. What is the man trying to do?
The correct answer should be pretty easy to parse out. And I'd expect a better success rate for humans than some of the captchas today that increasingly are looking more like magic eye puzzles than character recognition. But of course the big question is generation. Can these sort of implication based stories be generated in a way such that the final text can not trivially be reversed to the answer (without even considering the 'meaning' of the question)? And for that matter can these even be realistically generated in mass?
You're in a desert walking along the sand when all of a sudden you look down and see a tortise. You reach down and flip the tortise on it's back. The tortise lays on it's back, it's belly baking in the hot sun but you're not helping. Why is that leon?
People always come at this from an angle of "what can I do that computers can't?". You need to take into account the incredible diversity of people who use the internet, and what they can and can't do. There's already a viral article written by an old lady who can't pass the current captchas. Add to this, people who don't speak english, or don't speak it well; people who battle to read and comprehend text in any language; people who battle with logical reasoning; etc, etc, etc. The lowest common denominator for a task that be easily solved by any human is pretty low.
When writing such captcha questions for a forum, I generally use google as a validation to see that google can't answer the question in the top listed links. This allow me to easily adjust questions to the point where natural language processing should not be able to answer the question but a human person would.
Yep, Question-Answering Semantic Role labeling is an interesting research project around crowdsourcing NLP datasets. https://dada.cs.washington.edu/qasrl/
This was my first thought, there are many different things a human could think of, and we will probably cycle through them all in a few ms. This would have to be multiple choice and then what? The "AI" would have a baseline 25% chance of getting it correct (assuming 4 options).
This [1] is the article they're citing. Note that a cursory search turns up similar claims from back in 2013; it might be worth waiting for someone with more experience and less bias to express their opinions before dumping your captcha-related stocks.
Since when was captcha not broken? Sites like http://www.deathbycaptcha.com/user/order have been around for ages. Yes, a mere $6.95 gets you 5000 captchas solved by OCR and humans in an avg of 6 seconds. Imagine that job.
Sure, AI can break captcha, but it can be done at scale for far less than an AI research and GPU rig costs.
Google's approach to bot recognition is training their own bots incidentally, so even an adversarial network attempting to bypass it would give it a ton of training along the way to breaking in.
I don't believe it's a job. Isn't this the thing where captchas on target sites are simply mirrored on other sites like sketchy filehosts? Real human users are solving captchas to access some content hosted by this service, and the solution they enter is passed through to the target site.
I believe it is a job. There are simply too much volume to be satisfied by the inconstant traffic on filehosts. Also, they have overload and lower response times on public holidays in India.
If they are using filehosts, how would they verify captcha is correct? They can double check but it will lower their capacity and solve times.
That's also how Google self-driving cars work. When they don't understand what they are seeing they just show pictures to some random person in Google captcha who solves it quickly.
I don't see how captchas are "fundamentally cracked" if they only claim a success rate at best around 2/3rds. Nor do they give an explanation for what they mean by fundamentally cracked.
This is my experience as well, very frustrating being locked out of your account when you need to take care of business. Nevermind cracked, they are fundamentally broken if a human can't get a nearly perfect success rate.
A captcha is cracked if it becomes economical to try to pass it over and over again. If you have a script that succeeds in spamming a forum 2/3 times it tries, you've got a successful spamming system.
What they mean by fundamentally cracked is that this method seems to be more robust against minor variations of spacing, font, etc. than CNN-based models.
As good as that one is, and I do rather like the themes, it loses heavily in depth and complexity to Kittens Game. Universal Paperclips will take under a day to beat; Kittens Game will not fall so easily.
Many types of CAPTCHA systems can be defeated with machine learning models and OCR. Google provides its own called Google Vision API. Here is a brief example how this is done in practice: https://blog.websecurify.com/2017/10/cracking-captchas.html
Perhaps this is an old news as this technique has been out for a while but I find that it is still relevant in the many cases I have encountered.
Furthermore, in my experience, I attribute Google's failure to improve reCAPTCHA's "I am not a robot" visual appeal as one of the key factors why many organisations are simply not using it.
I think rather than being broken, captcha models are just going to be made more complex. Maybe they'll start asking you to write a poem or play a mini problem solving game.
We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.
a lot of services use facebook to verify that someone is a human. there should be a service that exists only to manage peoples identities online. sign up, provide some id, an address and last four of your social. later, maybe a letter is sent to the address and returned with a verification code. then, every other service on the internet could use that service to prevent bots, spam and other things.
those companies gathered tons of information without even asking people first. and why would this service require the sensitive information to be stored at all, let alone in plain text?
Ask Equifax, they “didn’t need” to store as much as they did, let alone in plain text. Perhaps it’s useful for, oh, making sure two people aren’t signing up with the same information.
Other problems include:
* Social security numbers are explicitly not ID numbers
* Not all Americans have social security numbers
* Even if they did, Americans are about 5% of world population
* Not everyone on the internet gets post (say hi to Nairobi, where addresses of relatively rich locals may be “$Name, third on the left behind the petrol station on the highway”, while poorer people have homes that don’t officially exist on roads that don’t officially exist. Some of these people rely on mobile phones for payments, as banks don’t care.
* Not everyone has an ID certificate or card or passport or driving license to provide, and if they did, why not just skip the middle man and ask for that directly like AirBnB (and every international airline I’ve used) does? Or your bank account, like PayPal does — after all, a bank will certify your ID*
(* they probably don’t all, and even if they did, not all people have bank accounts)
Basically, ID is not a perfectly certifiable thing, so you need to design systems to accommodate failure — even if we actually solved it in theory, all it takes is one criminal finding one implementation flaw and a system that assumed perfection would allow grand exploitation.
im talking about storing information. to be sure that someone is real, collect tons of info and then get rid of most of it and tag their account with a validation stamp. store just enough info so that two people having identical sets of that info is almost impossible. and im not saying ssn or any other peice of info is the best solution -- obviously it would be anything and everything tailored for whatever systems the persons country of residence currently maintains. as for exploitation, i think it would be simple to make exploitation rare as long as good id systems are implemented by the government or other entities. it would certainly be better than having an internet where its impossible to verify if someone is a human or a robot...
“Almost impossible“ isn’t good enough. “Almost impossible” is why two women in Floria have the same SSN: born the same day, same state, almost the same name (Joanna and Joannie Rivera), and why they were not even the example I had in mind when I googled for a similar story I’d heard a decade earlier.
And if your goal is just “is this a human or a machine?”, well, let me introduce you to the idea of identity theft, and why people stole all that data from Equifax.
The only way to tell humans and bots apart is some form of automated Turing test, hence Completely Automated Public Turing test to tell Computers and Humans Apart.
in fact turing tests will be useless at some point, which is the whole basis of my speculation about alternatives to it as i originally stated. dont you agree that captchas will become obsolete at some point? what then?
Here in Norway we use something like that for access to banks, tax, pensions, social services, etc. All of these services allow you to log in with what is called BankID. You apply for BankID and supply an ID like your passport then all the other banks and institutions accept that. It uses a two factor scheme with SMS, code cards, apps in SIM cards, etc.
But of course this isn't available to be used by some random kitten video trading site.
Also why would I want to give up my real identity to a lot of the sites that use a captcha?
I had a chat with my American co-worker why they don't use something more modern, and he seriously tried to explain me that requiring ID would be racist.
Now I knew the theory already, since I spend a lot of time online, but I asked him to explain it to other co workers, most of them very liberal as they say in U. S, they all thought he is joking.
dude what does that even have to do with this? identity will need to be taken care of in the absence of capcha, or else we will have an internet that cannot differentiate between a bot and a real person. think of all the websites and all the broad categories of websites that would be broken by that. is this not a problem in need of a solution?
- A man is running. A dog is behind him barking and growling. What does the man think might happen?
- A man goes up the stairs to the roof. He walks to the very edge of the building. He takes one more step. What is the man trying to do?
The correct answer should be pretty easy to parse out. And I'd expect a better success rate for humans than some of the captchas today that increasingly are looking more like magic eye puzzles than character recognition. But of course the big question is generation. Can these sort of implication based stories be generated in a way such that the final text can not trivially be reversed to the answer (without even considering the 'meaning' of the question)? And for that matter can these even be realistically generated in mass?