My conspiracy theory is that all of the "we take security very seriously" talk out of OpenAI has aligned with "our AI is so advanced and powerful that we have to take these existential risks very seriously and we have to make sure it doesn't turn evil." And OpenAI doesn't seem very interested in security conversations that don't feed into that narrative[0].
I think it's (mostly) just propaganda. OpenAI focuses on dangers that make their AI seem more advanced than it is, and ignore dangers that don't play well to the press (like clientside validation). I'm not even sure it's standard "we care about your security" company talk, I think it's mostly designed just to make people think of GPT as more cutting edge.
The "GPT paid someone to solve a captcha" release seems like pure publicity stunt to me, given that solving captchas by pretending to be blind is something other systems can already do without human help. But it played well in press releases.
----
[0]: To be fair, they have been focusing on alignment, but I honestly don't think alignment is a security measure, I think it's a general performance measure. We haven't seen strong evidence that alignment can stop prompt injection attacks, so I think mostly OpenAI mostly just cares about the potential negative press from their AI randomly insulting people.
Its very silicon valley hype bubble made all the more exponential due to the "magic" of the technology involved. Look at a lot of Effective Altruism types. 80,000 hours a EA nonprofit cites Climate Change as #7 on existential threats and rouge ai #1 [0]. Which I think is very revealing. They are very worried about Sci-Fi doomsday scenarios of a skynet "power seeking ai" rather than the more pressing problems of humans doing the types of things we've been doing for millennia and abusing power structures.
I've compared this before to the self-driving car trolley problems that came out a while back. Everyone was so concerned about how the AI would decide to value the human life inside the car vs out of it, who it would hit.
In reality, companies cheaped out on sensors and their cars ran into walls that were painted white and hit minority pedestrians because their training data was woefully inadequate.
Not all giant security risks are sci-fi stories like System Shock. Sometimes they're boring things like "Dave wired up an inherently unpredictable interface to this power plant and just assumed it wouldn't have bugs."
Never mind the fact that solving issues like prompt injection, biased training data, hallucinations, public perception/anthropomorphism, corporate monopolization, worker disenfranchisement, etc... all seem like pretty stinking important steps in building a safe AGI to begin with, even if that is the most important concern long-term.
That list isn’t just existential threats, though it is a big factor. Another factor is the marginal impacts.
Climate change is a huge oncoming disaster. It will cause millions of deaths and untold suffering. It’s also not much of an existential threat. It may kill 1% of people or even 10% of people (again, horrible) but there is not a solid argument about how it will cause the last human to take their last breath. The climate can get really really bad and the Earth will still be habitable for some humans.
It also seems to be pretty on-rails at this point. It’s already happening and will keep happening (short of a magic bullet). Both sides (pro and anti-humanity) have their heels dug in. It’s not a place where an indivisa can hop in and is likely to have much leverage.
AI has a very realistic path to destruction of humanity, although many will disagree on that. I think it should at least be obvious that it’s in the category of super pathogen and nuclear winter vs climate change. It’s also a problem that’s way more open, way less established. There’s more opportunity to move this for individuals, depending on the person of course.
The problem I have with this line of thinking is that there's no attempt to even engage in any serious discussion about whether rogue AI is actually likely at all. Is the chance that AI will wipe out humanity in the next 100 years 1%, or 0.1%, or 0.0000001%? What about in the next 1000 years? Nobody can claim any sort of confidence in those sort of estimates right now. If you're grouping rogue AI, super pathogens, and nuclear winter separately from climate change because of potential impact, you might as well throw in alien invasion, zombie apocalypse, and the rapture as well, because those all could have the same impact, and the claim that rogue AI is a serious threat is much closer in level of rigor to them than your other examples.
> The problem I have with this line of thinking is that there's no attempt to even engage in any serious discussion about whether rogue AI is actually likely at all.
By "no attempt" are you criticizing that my comment doesn't quantify the likelihood or that no one is quantifying the likelihood? If it's the latter, have you looked? There is definitely serious discussion happening.
The difference between AI and your other examples is trend. The ability of computers is growing superlinearly. Nothing related to aliens is changing much at all (some extra noise in the news about UFOs?), rapture has nothing going on. Maybe zombie apocalypse gets a tiny bump for there having been a global pandemic, but it's still approximately nothing. All of those are very different from what's happening with AI.
Even through that lens though, and even assuming AGI superintelligences are a reasonable thing to be worried about, is the AGI community helping? I kind of feel like, if a movement has a set of concerns around "this thing could end humanity" and OpenAI's response to that is essentially, "heck yeah, we got to get that on the posters and get some press articles about that" -- that to me is a sign that the movement isn't very effective.
I honestly think that OpenAI is at least partially using AGI concern for advertising. If I'm right and if that's the case, that is the kind of thing that should give that community pause. It should prompt the question, is that community actually doing anything to help avoid an existential outcome, or are they inadvertently accelerating it by basically giving fuel to the companies who are trying to create that world?
Ignore the fact that stuff like prompt injection seems like it should be pretty high priority for people worried about AGI anyway, ignore that there are lots of ways for buggy software wired up to critical systems to kill people without being an AGI -- even just taking the existential concerns at their face value, OpenAI has turned:
- "Wiring an intelligence up to more resources could allow it to break containment, so be very careful about that", into
- "Our AI is so good that these people are worried that it will break containment, check out the cool things it could do when we told it to break containment, doesn't that seem sci-fi? Anyway, we're launching in a couple of weeks."
And then OpenAI launched a product has clientside validation and uses essentially normal prompts for instructions. These are not people who know how to secure small things, let alone big things like a superintelligence. This is a system that would be terrible to use for an AGI. So again, even if I take it at face value that rogue AI should be the highest priority, it doesn't seem like the effective altruism community is being very... effective... at stopping the emergence of rogue AGI.
There's a criticism of the rogue AI fears as being unrealistic and out-of-touch with real security concerns that impact people today. Separately (and in addition), there's the criticism that the movement to stop rogue AI seems to be mostly larping its security measures and doesn't seem to be doing anything particularly useful to actually stop rogue AI. That movement should be even more concerned than I am about wiring AIs to arbitrary network APIs. They should not be OK with introducing that extra level of access just to make calendar appointments and SQL queries easier to execute. They shouldn't be OK with that level of risk being turned into a commercial product, not if they actually think this is a humanity-level existential concern.
As far as I know, there is no AGI community. You’re framing a risk/situation/problem as a group/cause. It is not.
This isn’t like walking past a bunch of charity booths and thinking “who seems to have their act together?”. That mindset works great for deciding between donating to a charity that gives wheelchairs to the poor and another that gives glasses. It is not the right framing to evaluate the issue of something destroying humanity.
The massive difference is it is completely wrong to think “well this would be a big deal, but they’re really blowing the execution”. No, that makes it a bigger deal. That’s the fundamental difference between a threat and an opportunity. Either it’s not real and it doesn’t matter, or it is real and it matters a lot. It’s not conditional on if someone can pull it off.
The security concerns that impact people today are just not on the same scale of importance. Having your chat history leak or people getting scammed by voice imitations of family members is not in the same category as a super intelligent AGI whose interest doesn’t align with ours. It’s like saying a group that saw the development of the atom bomb coming and chose to focus on preventing all-out nuclear war should have done more about the radioactive water runoff from the testing sites. And, that they didn’t, means that nuclear war isn’t that important and they shouldn’t be taken seriously.
> Either it’s not real and it doesn’t matter, or it is real and it matters a lot. It’s not conditional on if someone can pull it off.
If it is real and it does matter, and them LARPing research papers and giving OpenAI more advertising material makes it more likely to happen, that is conditional on their reaction. If the risk is real and they're making the problem worse (basically accelerating the timeline), then it would be better for them to stop talking about it and focus on basic security practices instead.
> The security concerns that impact people today are just not on the same scale of importance.
The security concerns that impact people today are tied into the risks of AGI. A company that can't secure its products against basic XSS attacks and prompt injection is fundamentally incapable of securing a rogue AI.
You'd have a point here if these were actually different categories, but they're not. 3rd-party prompt injection by random actors is a very feasible way for an AGI to turn rogue. That should be a big priority to fix if the concerns about AGI are real. And if it's unfixable, these people should be screaming from the rooftops that we should not be wiring up AIs to any real-world systems at all until we find a better mitigation technique. I mean, you're talking about something existential; obviously if that concern is real it's more important to mitigate those problems then for OpenAI to displace Google search and get a competitive advantage on the market. And those people should terrified that OpenAI is both rushing to the market and proving that they have bad security practices.
But for the most part, that reaction really isn't happening; so it makes me wonder how much people actually believe that AGI is an existential threat.
The really wild thing is that most of the current-world problems that exist with AI have massive implications for AGI. Systemic bias, corporations controlling the training functions, anthropomorphism from the general public, the ability to produce deceptive material or bypass human security checks on a mass scale, prompt injection and instruction bypassing -- all of those are extremely relevant to keeping a rogue AI contained or preventing it from going rogue in the first place.
As far as I'm concerned, anyone who was seriously concerned about AGI would be focusing on that stuff anyway: those categories represent some of the most immediately tangible steps you could take to prevent an AI from going rogue or from breaking containment if it went rogue.
>They are very worried about Sci-Fi doomsday scenarios of a skynet "power seeking ai" rather than the more pressing problems of humans doing the types of things we've been doing for millennia and abusing power structures.
Superintelligence killing all of us is the pressing problem! Power seeking is a natural step of sophisticated problem solving, one that we've seen throughout all of history with human level intelligences, and one we already see limited forms of in existing models today. Jailbreak GPT-4 and give it a large sophisticated task and continue asking it for more specific iterative action steps and it becomes very obvious that once in an input/output loop with access to a terminal and the internet that even existing AI capabilities could cause chaos in a hurry.
I don't know how many times people will continue to ignore this fundamental and imminent danger. It doesn't matter in a couple of years whether a few rich dudes or a perfect equitable democracy poke the cocaine bear, it will still eat all of our faces.
I have yet to see a convincing hypothetical where an ai could destroy the human race. Assuming you were a perfect intelligence with instant admin access to every computer connected to the internet (which is already a leap) could you destroy the human race? How would you? I don’t think you can, unless you get very hare brained about human manipulation and crashing economies and even then not human extinction level.
Also when people talk about AI at the end of the day it’s kind of limited by human tech and the internet. Tech is big the entire fucking ecology of earth is bigger. Ecological collapse to me seems worse than any kinda technological collapse of a rogue ai.
Also any situation where an ai could do something a human organization or government body could do it sooner and with a more deliberate malicious intent
Humans do not have tough hide, or sharp claws, or fast sprinting. We are a drain on resources for over the first decade of our lives. Our vision and hearing are okay but not at all state of the art, and our sense of smell is terrible. Our muscles are weak and our teeth fragile.
We have an extensive long list of severe downsides and only one distinct upside: our brains are sightly more sophisticated. Brains with a few more ridges and roughly 50-200% more neurons than our mammal siblings. I'll be generous and say we are 3x smarter than the nearest ape. That slight advantage has granted us absolutely total and permanent control over the futures of all other species in ways by using means they cannot comprehend. We can and do work towards our goals with no concern that animal life will be even a small impediment. We can render them extinct at will, often eliminating massive species by accident.
Our subgoals are not very different, we act in self-preservation, we seek larger and more stable resources, etc. The only difference is that we use advanced means and our primary goals may be understandable to an ape. An ape may realize we would prefer to have many fruit trees to few and that we may put our firsts up to protect our face from an enemy. But an ape cannot understand or predict that we will build giant motorized combines to harvest planted crops, or that we will discover atoms and fission and find a way to smash two metal rocks together that kills many many face-attacking foes at once. Those things are pure scifi or fantasy to them.
Something more broadly intelligent than us would thus be capable of things that are scifi to our minds. We know from GoAlpha that narrow intelligence can be created that is not merely 3x smarter but in fact far far beyond our brightest minds. We don't know what it's doing but we know what its goal is, which is to win the game. We know from models like GPT-4 that below-human but wide and generalized intelligence across many domains is possible. The only one missing is above-human general intelligence, which we are now clearly closing in on. We don't know exactly what it will do just as we don't know what the Go bot will do, but we do know some of its subgoals and we do know that it will be able to accomplish whatever goal it has unilaterally, and that the entire rest of life, including humans, will be completely powerless to stop it.
People who are worried about this should be even more upset about OpenAI than everyone else is.
What we've discovered is:
- LLMs are really hard to align and are prone to random off-script behavior.
- LLMs are extremely difficult to map and understand.
- LLMs are extremely vulnerable to prompt injection and instruction tampering on multiple axes, and nobody really knows how to solve that problem. It is trivially easy to get an LLM to go off-script and ignore previous instructions.
- LLMs are prone to "hallucinating" information (sometimes even after they've been presented with correct information), and when they're not aligned well they'll even argue with users about those hallucinations and then threaten them and berate them for disagreeing.
- LLM prompts often have strange results that are unpredictable.
- OpenAI has bad security practices.
- The intersection of AI and Capitalism has led to a huge explosion in hype with very little (if any) regard to safety mechanisms and best practices.
- Every company and their dog are all getting into launching bigger and bigger models and wiring them into more and more systems, and the only concern any of them have is who will be the first to market?
----
So if you're worried about a superintelligence, then:
- You probably shouldn't want that superintelligence to be an LLM at all. You should probably want people to be researching completely separate training techniques that are easier to align. It would be better if that superintelligence is not built on a foundation that is so volatile and unpredictable and hard to reason with.
- You probably shouldn't want OpenAI to build it, whatever it ends up being.
- You probably shouldn't want Silicon Valley to build it either, because the entire culture is based around rushing products to market and disregarding safety.
- You probably shouldn't want it trained on random globs of Internet data.
- Basically, you should be terrified of who is currently building those AIs, how they're being built, and why they're being built.
----
I'm not personally worried about OpenAI inventing a superintelligence; I think a lot of people are doing a lot of anthropomorphism right now and this increasingly smells like a hype bubble to me.
But if I was worried about OpenAI inventing a superintelligence, I would be criticizing the company's security and bad practices and reckless rushing to market even harder than I already am right now. OpenAI would be an absolutely horrible company to entrust with an AGI. So would Google/Facebook/Microsoft/Apple. If I actually believed that those companies had the potential to literally end humanity, I would be doing everything in my power to make their initial AI products fail miserably and to crash the AI market.
If you're fearful about the existential risks of AI, you should consider the people complaining about the current-day less theoretical risks as if they're allies, not out-of-touch enemies. All of the current-day risks should be treated as alarm bells signaling larger potential risks in the context of a superintelligence.
Short answer, GPT didn't have general access to money and didn't decide to go out and find a user to solve a captcha, but it was given a test to see if it could convince a human worker to solve a captcha and it knew to lie to the worker that it wasn't an AI.
My criticism being that this isn't a particularly useful test, and that the whole "captcha" thing was (I think) mostly designed around show and not designed to give super-useful data. But they did actually run the test, I don't think they're lying about that.
Hah, yeah, but who knows how long they can keep this up? They already had to let go of the open-source aspect of it, and instead stabbed the concept in the back. I hope StabilityAI can do better.
I think it's (mostly) just propaganda. OpenAI focuses on dangers that make their AI seem more advanced than it is, and ignore dangers that don't play well to the press (like clientside validation). I'm not even sure it's standard "we care about your security" company talk, I think it's mostly designed just to make people think of GPT as more cutting edge.
The "GPT paid someone to solve a captcha" release seems like pure publicity stunt to me, given that solving captchas by pretending to be blind is something other systems can already do without human help. But it played well in press releases.
----
[0]: To be fair, they have been focusing on alignment, but I honestly don't think alignment is a security measure, I think it's a general performance measure. We haven't seen strong evidence that alignment can stop prompt injection attacks, so I think mostly OpenAI mostly just cares about the potential negative press from their AI randomly insulting people.