The most interesting part, to me, of a release like this is the amount of "please don't abuse this technology" pleading. No licence will ever stop people from doing things that the licence says they can't. There will always be someone who digs into the internals and makes a version that does not respect your hopes and dreams. It's going to be bad.
As I see it, within a couple years this tech will be so widespread and ubiquitous that you can fully expect your asshole friends to grab a dozen photos of you from Facebook and then make a hyperrealistic pornographic image of you with a gorilla[0]. Pandora's box is open, and you cannot put the technology back once it's out there.
You can't simply pass laws against it because every country has its own laws and people in other places will just do whatever it is you can't do here.
And it's only going to get better/worse. Video will follow soon enough, as the tech improves. Politics will be influenced. You can't trust anything you see anymore (if you even could before, since Photoshop became readily available).
Why bother asking people not to? I guess if it helps you sleep at night that you tried, I guess?
I wonder/hope if a downstream effect of this technological change will be the end of the idea of a "shameful" or "humiliating" image. We all have bodies, we all have sex, and I agree that soon images of ourselves being nude/having sex etc. will proliferate because they'll be generated instantly via Siri shortcut as part of casual banter.
In a world where every celebrity is having sex with gorillas, doesn't such an image lose its charge? Will Norms and values around sex/body shaming change?
That seems unlikely, given that making up hurtful stories about people and transmitting them via text or voice is still a thing. Everyone knows that anyone can make up any story they want without any technology whatsoever, and yet spreading rumors is still a thing.
Not really. I can't think of a recent "leaked texts" that the participants cannot not easily and plausibly deny (e.g. elon's supposed messages to gates), or even voice messages. Even most images can already be denied as photoshop if all the witnesses agree. The only medium that is somewhat hard to deny is videos, like sex tapes, but that's also not too hard. I think there will soon be a race to make deep learning pics look completely indistinguishable from phone pics.
Perhaps that's somewhat true for famous people, although there are plenty of examples of false stories (without any forged evidence, literally just stories) causing real embarrassment and damage to reputation.
But it's even more true for non-famous people getting bullied in their social groups, both online and offline, and that's more what I was responding to (the "asshole friends" in the original comment).
There are a few different kinds of 'secure enclaves' implemented on chips, where you can have some degree of trust that it "cannot" be faked.
E.g. crypto wallets, hardware signing tokens, etc.
We could imagine an imaging sensor chip made by a big-name company whose reputation matters, where the imaging sensor chip does the signing itself.
So, Sony or Texas Instruments or Canon start manufacturing a CCD chip that crypto signs its output. And this chip "can't" be messed with in the same way that other crypto-signing hardware "can't" be messed with.
That doesn't seem too far-fetched to me.
* edit:
As I think about it, I think more likely what happens is that e.g. Apple starts promising that any "iPhoneReality(tm)" image, which is digitally signed in a certain way, cannot have been faked and was certainly taken by the hardware that it 'promises' to be (e.g. the iPhone 25).
Regardless of how they implement it at the hardware level to maintain this guarantee, it is going to be a major target for security researchers to create fake images that carry the signature.
So, we will have some level of trust that the signature "works", because it is always being attacked by security researchers. Just like our crypto methods work today. There will be a cat-and-mouse game between manufacturers and researchers/hackers, and we'll probably know years in advance when a particular implementation is becoming "shaky".
> In a world where every celebrity is having sex with gorillas, doesn't such an image lose its charge? Will Norms and values around sex/body shaming change?
I hoped this would happen with social media. Everyone says stupid shit online and everyone has past beliefs they’ve outgrown. So what’s the big deal?
Instead we went the opposite way. Everyone is super self conscious and censoring at all times because you never know who’s gonna take it out of context and make a big deal.
It is an interesting point [the hope that values will adapt to reflect typical mischievous patterns of social dynamics within various clusters].. I used to marvel during the dotcom era at the salivating delight of "the internet never forgets" of college students caught smoking a bong and the subsequent impact on their career (virtue signalling?).. I saw hope that society could then adapt about ridiculous FUD biases. There is a strange relationship of scarcity & opportunity and these windows into souls painting a distorted picture of the darker side of our ambitions. It is a cancer. Also, humans are always testing boundary conditions - curiosity, discontent, security,insecurity. Some inherent desperation of people jockeying for the next path from frying pan to fire in the search for greener pasture breeds opportunism that seems certain to favor desperation and negativity.
If I take your photo and photoshop (very well) different surroundings. Is it a real photo if you?
If I photoshop your face (very well) onto a different body. Is that real?
If I feed your photos into a model that can create realistic versions of those photos in different poses or with different facial expressions. Is that real?
They all start with something that is very definitely a real photo. You can’t (yet? ever?) generate a realistic photo of a specific person from a textual description. The machinery needs a source.
Simple: a "real" photo is one in which a light field from the real world impinges on a photosensitive media (CCD, film), and directly encodes that information, with some allowance for global light levels, gain, ISO, and speed. Anything else is a modification therein. HDR, multi-exposure compositing, etc, aren't truly "real". They may be 99% real, but aren't 100% real. If you crop it, it's 99.9% real (we have models which can detect cropping and even from which region it originated, obviously it can't reconstruct the missing data).
Yes, by that definition, most photographs already aren't real.
Sure, I'd say that's a real photo - you could easily pull it off with a single exposure of film.
Just because the photo is real, doesn't mean it's free from deception. I could take a "real video" of gorilla suit^W^W bigfoot and it'd still be deceptive.
You’ll be surprised to learn that doesn’t work without some amazing tech to process the photons. Different settings will produce a different photo.
Hell, just changing focal length makes a bigg difference to what your face looks like: https://imgur.io/gallery/ZKTWi no digital manipulation required.
Which of those faces is “real”? They’re all just recording photons hitting the camera, but look very different.
It gets even worse when we start talking about colors. For example: it took cameras decades before they could accurately capture black faces. Where accurately means “an average person would say it looks right”
>In a world where every celebrity is having sex with gorillas, doesn't such an image lose its charge?
No, because the image itself doesn't matter. What matters is how much the public wants to hate someone. If t he public is primed, any remotely plausible incriminating image will do as an excuse.
Fortunately, these images are actually far than what someone can cook up with Photoshop. Unfortunately, it's a part of a bigger trend where we get more and more tools to produce, manipulates and share information, while the tools to analyze and filter information are lagging by at least half a century.
I don't generally like his style, but this novel really captivated me. Without spoiling the plot (much), the end-result of everyone being able to spy on anyone anywhere at any time is a kind of societal disinhibition, especially related to sex, nudity, or similar taboos.
If we get to the point where we can't tell AI generated images from reality, I'm not sure "body shaming" will be on society's collective radar anymore.
I bet someone said in 1830, "by the time we can send robots to Mars, body shaming will not be a thing anymore". For better and worse, that's not how we humans do things generally.
As someone who witnessed AI Dungeon's GPT-3 model with an unlimited uninhibited imagination for erotica I would download everything now before they cripple the models. I would not be surprised if they very shortly completely stop downloads due to "abuse" and pursue a SAAS model.
I think it's funny how Yandex, a Russian company, releases these big language models without all the AI safety handwringing in their press releases. The Russians have a tradition of releasing technology without giving a lot of worry about what happens to it, for better or for worse. For example, they made between 75 and 100 million ak-47s, a not in any way limited machine gun, and it spread to every corner of the world. They even gave out all the plans and technical assistance so any one of the organizations they worked with could produce their own. 20 different countries make ak-47s currently. Of course you had to register every xerox machine in the Soviet Union, so maybe they just had different priorities?
The west is absolutely fascinated these days with the control of advanced technology. Drones, Blockchain, and AI models seem to be the latest things that the west is determined to exercise control over. For example:
"Many of the technological advances we currently see are not properly accounted for in the current regulatory framework and might even disrupt the social contract that governments have established with their citizens. Agile governance means that regulators must find ways to adapt continuously to a new, fast-changing environment by reinventing themselves to understand better what they are regulating. To do so, governments and regulatory agencies need to closely collaborate with business and civil society to shape the necessary global, regional and industrial transformations." -Klaus Schwab, "The Fourth Industrial Revolution", Page 70.
StableDiffusion was trained by an academic lab in Germany, which is rather different from a company in California doing it. Though they both have the problem that the output is illegal in many countries with the wrong prompts.
As if we're all one block, a single minded hive. Corrupted, degraded by a heavy capitalistic and egotistical mindset. It's fascinating to see and talk about our shortcomings isn't it? We are so inferior. The fact you wrote "The West" signifies you're one of "The Others". Someone possibly from Russia. Then you compliment Russia and degrade "The West" adding further strength to this hypothesis.
Using "The West", a simple minded sound byte that's been used in propaganda for centuries, a way of appealing to our ape-instincts to protect our tribe against others. "Us" vs "The Others".
You should make a little effort in seeing how ridiculous, infantile, and brainwashed it is to refer to countries that are not Asia or Russia as "The West".
It's not about control of advanced technologies but rather making such technologies look more powerful by pointing out real or imagined destructive capabilities. If you want funding or "big up" a topic, claim it has the power to bring down the world (anyone remembering gray goo?).
> grey goo, a nightmarish scenario of nanotechnology in which out-of-control self-replicating nanobots destroy the biosphere by endlessly producing replicas of themselves and feeding on materials necessary for life.
> Politics will be influenced. You can't trust anything you see anymore
I've been wondering for a while now if this will lead to an unexpected boon: perhaps people will be forced to pay attention to a speaker's content instead of simply who is speaking.
Unfortunately a speaker's content can also be auto-generated now, at least for brief enough snippets. And that means the content can (and will) be optimized to appeal to a target segment much more than has ever been previously possible.
The problem with this is that you will never know who is actually speaking. Deep fakes are already a thing, but as they get better and more accessible we will approach a world where anyone can make anyone say anything and make it hyper believable. In that world, it will be very difficult to tell what is real.
My personal hunch is that this will end up leading to a situation in which presenters do a cryptographic handshake that works to verify and prove authenticity. This isn't a new idea, and it has some very obvious drawbacks, but I don't see much of a way around the issue. The handshake could work great for something like official news releases, but for other instances that might come up in court, say, dash cam footage of an accident, it seems to me that the legal system is going to face some serious issues as these programs progress.
Looking forward to a future where all a politician’s quotes are on a blockchain, signed by their private key, and they chose to do so voluntarily out of fear of deep fakes.
"We verified the validity of our source's signature and vouch for it's authenticity"
- Journalist from a respected newsroom, when they choose to keep the source confidential. So it won't be better or worse than it is now: all about transitive reputation.
The unspoken threat to further denials is the risk that the source may go public, or if unauthorized, doxxing the person who recorded the video (assuming signatures have nonrepudiation)
That's a start! I think their video appearances should be like a car in NASCAR with permanently displayed logos superimposed from all the interests that have funded their rise.
If I recall from the interview with Stability.ai's founder, he has more or less the same opinion, and that humans will adapt to the new technology as we always have. I figure "please don't abuse this technology" warning stickers are more CYA. It'll make the vast majority of judges look at a motion to dismiss and not blink an eye.
Historically, he is correct. It is easy to find people that were against TVs, Cars, Trains, Electric Cars. Those people were not entirely wrong with their logic. Trains and cars did make it much easier for scammers to come into a town and then leave quickly.
In terms of the equilibrium, this is certainly a true observation. However, historically speaking new technology can be extremely disruptive in the short-term as society figures out the new norms, and the power-structures are disrupted and then re-equilibrate.
Concretely, it's probably true that children born with this technology will have adapted to many of the negative (and positive) aspects of it. But the current generation of elites, politicians, and voters might have a harder time adapting.
>make a hyperrealistic pornographic image of you with a gorilla[0]
I don't understand this irrational fear. This can be done today, just need some minutes instead of some seconds to create a good Photoshop.
Also, seriously this is the thing you fear? fake porn? there are much worse thing you can do with this tech, like phising, falsifications, etc. Not mentioning leaving millions of graphic designers out of job.
Photoshop is a skill possessed by more than enough people for the world to have already been transformed by Photoshops of famous people and anybody else graphic designers wanted to embarrass, if moderately-convincing images were that transformational a technology for adversarial use.
I agree that it's much easier to do low effort stuff to wind friends up, but universal access and low effort don't make it more likely to be impactful and believable.
That's what I was thinking. Whether a pornographic picture of me and a gorilla was made by photoshop or AI is irrelevant. People's reactions will be the same, and repercussions will be mostly the same (which doesn't means there will be consequences).
If someone really wants to hurt you, not having AI isn't going to stop them.
The real effect will be that you can publish a real picture of you fucking a gorilla and nobody would believe it because it's trivial to generate it with an AI.
> seriously this is the thing you fear? fake porn?
I'm being polite. Things will be so much worse.
Someone will make child pornography using your child's face as the input. Someone is going to take private videos of politicians and then edit them to have them say incriminating things. Someone is going to short the stock of a large company, then release a faked video of the CEO being shot, and profit from the immediate stock plunge.
And this is just what my mind can come up with. Imagine what 4chan will invent.
> The most interesting part, to me, of a release like this is the amount of "please don't abuse this technology" pleading.
> Why bother asking people not to? I guess if it helps you sleep at night that you tried, I guess?
I've perceived this as them doing the necessary amount of virtue-signaling and ass-covering to avoid the ire of groups that are loud/powerful enough to cause issues for them in the short-term. I've been following the developments in this space for a while now, and I don't get the impression that Stability AI cares too much about forcing Western ideals of "correctness" onto the public.
It's plainly obvious that this is going to be immediately used to produce content considered obscene, offensive and/or illegal by various groups of people. And it is what it is, we're going to just have to figure out how to live with it as a society. It's going to get far far worse (from certain points of view) as we continue to replicate functionality previously exclusively featured in human brains.
I never saw it create anything more NSFW than female boobs. It won't generate penises or genitals besides mounds of hair. Right now the "NSFW" flag/filter is 99%+ false positives. This is stuff that used to be allowed in G-rated movies in the US.
True to both Stable Diffusion and Dalle2, the more encompassing the content of your image is, the more incorrect the detail is. Both generally will make fantastic faces but anatomy and consistency starts falling apart with full bodies. Contorted limbs, missing fingers, too many limbs, etc. No one is going to be fooled here.
Of course the models could be trained on offensive images and over a long enough time period eventually will. But, for now, someone is going to have to spend millions of dollars on compute and have the human expertise behind it too. Then again, if we had an image generator making pictures of [insert X offensive thing] that is a whole lot less disgusting than the plentiful real photos and videos of that thing in reality.
It feels sometimes that there has to be small, influential group of people on some jihad against generated pornography on the Internet, with impacts in the order of $10-100mil.
I'm sure this is not a view shared by everyone, but it is a straightforward course of action to add such disclaimers in cases where you do agree with it and not wanting to be targeted and blacklisted from payment networks(that seems amongst their weaponry).
> you can fully expect your asshole friends to grab a dozen photos of you from Facebook and then make a hyperrealistic pornographic image of you with a gorilla
my prediction is that, as a result, people will start assuming pics online are fake until proven otherwise.
That's not the only issue with NSFW; the larger problem is when you don't ask for it and get it anyway. Especially because this model is not particularly good at it and you'll get body horror.
Well it comes with an NSFW filter that's activated by default so that's a non-issue (but importantly you can turn it off if you're ok with accidentally seeing stuff like that without explicitly asking for it).
Like you mentioned near the end, all this has been possible with photoshop, with amateur level skill. Hollywood can CGI the entire Captain Marvel movie, so as far as state-level efforts go, AI can really only be an incremental improvement at best.
I think this is all just trendy popular sentiment moralizing AI.
It seems inevitable like it's "inevitable" that they Photoshop your face onto porn. Yes, of course it will happen but maybe not to most people? I'd guess inevitable for many celebrities.
But even today, we deal correctly with it. Fakes and real photos are mingled together in 9Gag/LatestNews reports about Ukraine. Under the fakes (and the real), people ask for confirmation. Someone says it’s true, no-one believes him, until a link to a newspaper is dropped. And 9Gag isn’t the highest IQ community around, so yes, general population does distrusts photos by default until proven.
They are laughed at anyway if they tell a story coming from a forged photo.
Sure, newspapers could forge stories, display pictures with, I don’t know, Biden’s son with a crackpipe, and make the populace believe untrue stories. But guess what, they already do it anyway, newspapers already “spin” (as they say, i.e. forge, suggest without literally saying) stories all the time.
I have a quite different perception of 9gag.
Yes, some ask for confirmation but it depends very much on the topic.
Wrong topic and facts get downvoted and the fake news prevail.
And not all links to newspaper are considered valid, especially if it's about "woke culture".
Then you have to search the reasonable needle in the haystack of transphobia, homophobia and misogyny.
The problem comes from the early adult newsroom interns responsible for sourcing content. They don’t know it’s fake, it sounds like a good click-baity article to them, so they run it. It happens.
I wouldn’t shift responsibility on the shoulders of the last newcomer. The top of the management has had ample time to diagnose this. If it remains like this, it’s by design.
Significant progress has been made on video - inter-frame preservation of the surface level spatial invariants in 3D environments has been achieved.
But preservation of transtemporal spatial invariants requires understanding far more than that - dynamic lighting, density, flexibility, rigidity and momentum, the viscosity of the air, the skeletomuscular system, the flow of the fluids within, and so on ad infinitum.
And a lot of that is tacitly understood by the human mind (even when the human mind would struggle to generate a scene, it can often detect that something is wrong - try turning on the lights in a lucid dream).
It's going to be quite some time before it reaches the point where a human cannot detect that video was generated (or altered) and even longer before computers can't.
But then, it's going to be a shit-show - and I'm not talking about the bestiality videos.
Evidence, as we know it, will be meaningless - the implications for the legal system are terrifying.
I think you don't give enough emphasis on the "if you even could before" part. Sure, this makes realistic fakes easier and more available to to the masses. So what? We've had the technology to 'frame' people in similar ways for decades now, and other than entertain the 4chan teens it does squat.
There's enough money interest in bringing down certain politicians, if faking a sex tape or back-room conversation would make any difference to the world it would have been done already. Hell, politicians pretty much admit publicly that they are rapists and we don't do anything about it. Who's gonna care about a couple of gorilla-pics of a normie?
People don't and never have trusted any evidence that doesn't reaffirm their world view, regardless of how true that evidence is. More realistic evidence won't change that.
This is great news for those that actually have sex with a gorilla, because now they can claim it's an ai photo. :)
Kidding aside, I think this is actually good. Humans need ephemerality. We are never getting the full version of it back, but with photorealistic ai video and image creation some freedom returns. I think without it a society in which everyone has a camera all the time would mean absolute ossification of social norms. Right now it's very, very new - I mean multiple generations living with the current, or better (eg. recording eye implants) technology.
At first we will see a lot of "prompt censoring mobs" that will try to "stop abusers because children and terrorists", but as the images multiply, and they will multiply, the line between real and fake photos will become hard to spot, for real. This is i think a pivotal and great moment, because everyone can now claim plausible deniability to any picture. No revenge porn will be believable anymore, nor will anyone know if that Bezos's weiner pic is real or not.
Really, the licensing is most interesting? There’s a lot of public info about training and development too.
The license itself is pretty irrelevant. What people will actually do with the training blueprints, and how fast things will evolve.. now that’s interesting.
Well we will be able to generate satirical political images of politicians, of religions easily and quickly. So there are some upsides for the technology to be given to every comedian out there. Politicians and religions, did have an easy ride the last few years, so we must set the record straight from now on.
Some fear porn was thrown out when GPT-3 was released. I love GPT-3, i used it just yesterday it is very good. I am just wondering when the total destruction of the world will happen because of GPT-3.
> Politics will be influenced. You can't trust anything you see anymore
How much could you trust media content previously? Staged footage, false narration, biased coverage are nothing new. A counter-intuitive side-effect of opening the Pandora's box could be a realization that media is a form of simulation. Perhaps this will lead more people to filter what they see through a prism of critical thinking.
As far as I'm aware countries fall broadly into two camps. Camp 1, USA for example, is concerned purely with the abuse of children, i.e., anything that depicts or is constructed of pieces of real children is illegal but other things such as drawings, stories, adults role playing, etc is not. Camp 2 outlaws any representation of it whether or not a child was involved.
Nowhere will a training set featuring pictures of naked children be legal.
> Nowhere will a training set featuring pictures of naked children be legal.
Appropriately from the recent news stories, but it's easy to imagine at least portions of such pictures being available for medical diagnostic purposes. I've sent pictures of my children to my doctor, so presumably in the future it's easy to imagine sending pictures to an AI to diagnose which would require a suitably fleshed out (pardon the pun) training set.
>Nowhere will a training set featuring pictures of naked children be legal.
True, but generalizing beyond the training set is precisely the point of machine learning. A good generative model will be able to produce such images, no matter how heinous the content is.
I think the actual problem is that this gives plausible deniability against photographic evidence, which might result in increase of bad behavior. Even cameras which cryptographically sign their output can't prove that the input was actually photographed from the real world, or if it's just an image of an image.
> you can fully expect your asshole friends to grab a dozen photos of you from Facebook and then make a hyperrealistic pornographic image of you with a gorilla
... Someone is gonna do this to children. This technology is gonna end up on the news. Maybe they'll even try to ban it.
Seems less likely that everyone will "just be cool with" having pornographic images of themselves with gorillas spread everywhere and more likely that this is the impetus for demagogues to issue a digital ID for access to the internet.
Isn't it good if people learn not to trust photos and images shared in social media and learn to treat them as entertainment. So much productivity gains :-)
One day these image generators will also get video support... and pornography support. When that happens, a few things may occur that I think are reasonable to predict:
EDIT: Original post was way too wordy, TL;DR:
When AI-generated pornography becomes available, it could be likely that demand for "real" pornography disappears because the AI will match and surpass the "real." When that occurs, the "real" will become increasingly regulated and legally risky, and may just outright be as good as banned.
There will be a huge and unkillable market for non-AI generated pornography, even if people cannot tell the difference in an AB test. The demand will be too strong and I don't think there will be much outcry to ban it if it's all consenting adults.
If people can't tell the difference in an AB test, how will the real porn out compete the generated stuff? Porn distributors aren't known for their truth in advertising or care in sourcing of material. And even if they were, how would the porn distributors be able to source the real stuff, when anyone can create porn of anyone they imagine? You might say PKA will save us, but people aren't going to be typing out `gpg --verify` when their hands are otherwise occupied.
I didn't edit my post if you were referring to me. Either way, I don't have a strong argument for why I feel this to be true, but I do think people are going to scoff at ai generated porn.
You’re right. They call this an “ethical release”, but what ethics, I may ask. A profitable IPO is more likely to have been their consideration. As other researchers before them, they are willingly releasing something with the potential to do harm, or pave the way for it, washing their hands in innocence.
Played with it for a bit in DreamStudio so I could control more of the settings. So far everything it generates is "high quality", but the AI seems to lack the creativity and breadth of understanding that DALL-E 2 has. OpenAI's model is better at taking wildly differing concepts and figuring out creative ways to glue them together, even if the end result isn't perfect. Stable Diffusion is very resistant to that, and errs towards the goal of making a high quality image. If it doesn't understand the prompt, it'll pick and choose what parts of the prompt are easiest for it and generate fantastic looking results for those. Which is both good and bad.
For example, I asked it in various ways for a bison dressed as an astronaut. The results varied from just photos of astronauts, to bisons on earth, to bisons on the moon. The bison was always drawn hyper realistically, which is cool, but none of them were dressed as an astronaut. DALLE on the other hand will try all kinds of different ways that a bison might be portrayed as an astronaut. Some realistic, some more imaginative. All of them generally trying to fulfill the prompt. But many results will be crude and imperfect.
I personally find DALLE to be more satisfying to play with right now, because of that creativity. I'm not necessarily looking for the highest quality results. I just want interesting results that follow my prompt. (And no, SD's Scale knob didn't seem to help me). But there's also a place for SD's style if you just want really great looking, but generic stuff.
That said, the current version of SD was explicitly finetuned on an "aesthetically" ranked dataset. So these results aren't really surprising. I'm sure the next generations of SD will start knocking DALLE out of the park in both metrics. And, of course, massive massive props to Stability.ai for releasing this incredible work as open source. Imagine all the tinkering and evolving people are going to do on top of this work. It's going to be incredible.
"A bison as an astronaut, tone mapped, shiny, intricate, cinematic lighting, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by terry moore and greg rutkowski and alphonse mucha"
I've found that sometimes you need a lot of attempts to get good results. Here's a run I just did, I'd say the top left result is OK, the bottom 2 are wrong/misinterpretations, and the top right result is fantastic: https://i.imgur.com/l8BsWvI.png
Interesting. I took a stab at your prompt and SD really struggles. It just completely ignores part of the prompt. Even craiyon puts in an effort to at least complete the entire prompt.
The bison is very realistic at least. So maybe the future is different models that have different specialties.
You can take an image and feed it into SD along with the prompt (look for img2img in the readme), I think you can use that to create the idea in craiyon and then move to SD for a quality finish.
It doesn't seem to understand things that DALL-E seems to (usually) understand. For example, it doesn't know that there are 5 fingers on most hands. It cuts objects in half randomly, etc.
I haven't found many of the images it produces to be really usable.
Is there any way to download this on my PC and run it offline? Something like a command-line tool like
$ ./something "cow flying in space" > cow-in-space.png
that runs with local-only data (i.e. no internet access, no DRM, no weird API keys, etc like pretty much every AI-related application i've seen recently) would be neat.
Yes, that's actually the biggest reason this is such a cool announcement! You just need to download the model checkpoints from HuggingFace[0] and follow the instructions on their Github repo[1] and and you should be good to go. You basically just need to clone the repo, set up a conda environment, and make the weights available to the scripts they provide.
What's the difference between those 4 checkpoints?
From the GitHub's README:
sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en. 194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024).
sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt. 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2).
sd-v1-3.ckpt: Resumed from sd-v1-2.ckpt. 195k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
sd-v1-4.ckpt: Resumed from sd-v1-2.ckpt. 225k steps at resolution 512x512 on "laion-aesthetics v2 5+" and 10% dropping of the text-conditioning to improve classifier-free guidance sampling.
Which one is the general use case checkpoint one should be using?
Is Apple M1 support expected soon? Because even if Apple’s chips are slower, they have plenty of RAM on laptops. I saw some weeks ago it was coming, but I am not sure where to follow the process.
Sorry my bad, found the answer. One simply adds the following flags to the StableDiffusionPipeline.from_pretrained call in the example: revision="fp16", torch_dtype=torch.float16
Zero loss. All upside. Only causes issues when training. 32-bit ships by default because it is compatible with cpu and GPU’s that might not have native fp16 support.
Edit: Just to be clear, your intuition that it could cause issues is certainly merited - and not _all_ models can be trivially converted from fp32 to fp16 without some new error accumulating (during inference). Variational autoencoders like VQGAN and GAN's are particularly prone to such issues.
Can you please tell me where is the model.ckpt? I am not able to find any weight with ".ckpt" format there in the both links that you have given. There are the ".bin" file on the hugging face.
For anyone else reading you need the -original versions. The others are setup for the diffusers library and I can't find a checkpoint file in that, just the original one.
7. `ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt`. (you can download the other version of the model, like v1-1, v1-2, and v1-3 and symlink them instead if you prefer).
To run:
1. activate venv with `conda activate ldm` (unless still in a prompt running inside the venv).
2. `python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms`.
Also there is a safety filter in the code that will black out NSFW or otherwise expected to be offensive images (presumably also including things like swastikas, gore, etc). It is trivial to disable by editing the source if you want.
Unfortunately I'm getting this error message (Win11, 3080 10GB):
> RuntimeError: CUDA out of memory. Tried to allocate 3.00 GiB (GPU 0; 10.00 GiB total capacity; 5.62 GiB already allocated; 1.80 GiB free; 5.74 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
Edit:
>>> from GPUtil import showUtilization as gpu_usage
I also have a 10g card and saw the same thing - to get it working I had to pass in "--n_samples 1" to the command, which limits the number of generated images to 2 in any given run. This has been working fine for me
Thanks! This worked for me finally. The conda approach suggested elsewhere was getting too complicated, and wasn't working properly (for me).
I built a simple UI around this, which installs Stable Diffusion's docker image, and lets you play with it locally in a browser-based UI. https://github.com/cmdr2/stable-diffusion-ui
As an aside I wonder how performance would be like running this on CPU (with the current GPU shortage this might well be a worthwhile choice). Even something like 30 minutes to generate an image on a multicore CPU would greatly increase the number of people able to freely play with this model.
Yes, clone the repo (https://github.com/CompVis/stable-diffusion), download the weights and follow the readme for setting up a conda environment. I am presently doing so on my RTX 3080.
Can you please tell me where is the model.ckpt? I am not able to find any weight with ".ckpt" format there in the both links that you have given. There are the ".bin" file on the hugging face.
This should be possible if someone just exported them to tflite or onnxruntime etc (quantization could help a ton too). Not sure why ppl haven't yet. Sure it'll come in the next few days (I might do it).
You can try https://github.com/cmdr2/stable-diffusion-ui . It installs Stable Diffusion to your local computer, and provides a simple browser-based UI for playing with it. No need to mess with conda and other environment settings.
This release changes society forever. Free and open access to generate a hyper-realistic image via just a text prompt is more powerful than I think we can imagine currently.
Art, media, politics, conspiracy theories; all of it changes with this.
Photoshop requires experience and some talent, this doesn't. If I was some small rebel group in Africa or the Middle East with basically no money or training, I'd use this tool every single day until I was in power, or I'd frame my opposition as using it against the People.
Try and do that. Likely those people won't care too much about you, they lean towards authority figures in their community. It is way easier to find those guys and corrupt them. Rather than running some underground news agency changing minds of millions of people.
In fact quite often the former is the case + cheaper + less time to execute. "Western" societies will be more resilient to this scenario. So, mostly its gonna be a lot of "political" art we gonna see.
This is like a gazillion photoshops being released in the wild. Things change with scale, and there is a threshold where, if enough people start doubting often enough, then all the people will doubt all the time
Photoshop requires hours of work from a skilled professional to create results of decent quality. Now anyone can do it for free, virtually instantaneously.
What will change society forever is when the hardware required to run this software is available in the latest medium/high end phone and 100's million of people can download the app to say what they want.
This is one of the most important moments in all of art history. Millions of people just got unconditional access to the state-of-the-art in AI text-to-image for absolutely free less the cost of hardware. I have an Nvidia GPU myself and am thrilled beyond belief with the possibilities that this opens up.
Am planning on doing some deep dives into latent-space exploration algorithms and hypernetworks in the coming days! This is so, so, so exciting. Maybe the most exciting single release in AI since the invention of the GAN.
EDIT: I'm particularly interested in training a hypernetwork to translate natural language instructions into latent-space navigation instructions with the end goal of enabling me to give the model natural-language feedback on its generations. I've got some rough ideas but haven't totally mapped out my approach yet, if anyone can link me to similar projects I'd be very grateful.
>This is one of the most important moments in all of art history.
I agree, but not for the reasons you imply. It will force real artists to differentiate themselves from AI, since the line is now sufficiently blurred. It's probably the death of an era of digital art as we know it.
Making art is already not making much money for the vast majority of producers (outside of 3d modeling). There really aren't very many jobs in making art. I'd reckon most people are artists because they love doing it.
This is completely wrong, the digital art industry is enormous and, mind you, pretty uniformly pissed off that this model has been trained off of their non public-domain work.
They're mostly posting incorrect claims like "all it does it make collages out of other people's art" though. It doesn't do that; the model stores about 1 byte per original image it's seen.
Think of that 1 byte as a hyper-compressed essence of knowledge necessary to make said collage. Then it will be correct. Those models would not exist without the labor of the artist community - which in this case was used without consent and without pay.
These models would exist, just without painterly style. Most of the training data is photos of real things, a lot of which is stock photography that has been ripped off much in the same way like those artists. It is a challenge to copyright law, but if artists are allowed learn by looking at other peoples work, why treat an AI differently?
You're kicking the can down the road. Obviously, an AI is not a person, that's in the premise of the question. What makes an AI so different from a person that warrants differential treatment in this case? It's not that there aren't any good answers to this question, but yours is not much of an answer at all.
It's probably okay to learn facts from copyrighted material even if you're an AI - you can't reproduce the text of a novel but you can learn the meaning of a word from context in it. Similarly you can learn to draw a hand from looking at a ton of stock photos with hands, as long as you produce original hands.
AI will need some favorable legal precedents to avoid getting banned though, or else they'll have to only train off CC0 Flickr/Wikipedia scraping.
I also think it's a more obvious problem that it can reproduce copyrighted characters by eg prompting for "Homer Simpson".
Did we say the same in the 80s when audio sampling became a thing? We accepted it (after obligatory legal battles) and moved on, giving rise to the Creative Commons.
Yeah, the fact that these models are necessarily based on existing works leaves me hopeful that humans will remain the leaders in this space for the time being.
Human works are needed to create the initial datasets, but an increasing amount of models use generative feedback loops to create more training data. This layer can easily introduce novel styles and concepts without further human input.
The time is coming where we will need to, as patrons, reevaluate our relationships with art. I fear art is returning to a patronage model, at least for now, as certainly an industry which already massively exploits digital artists will be more than happy to replace them with 25% worse performing AI for 100% less cost.
The generated pictures that are posted in the blog post are superior than the average artist work. Which isnt surprising, AI corrects "human mistake" (e.g. composition, colors, contrasts, lines, etc) easily.
Why would people want to consume art that says nothing and means nothing? While this technology is fascinating, it produces the visual equivalent of muzak, and will continue to do so in perpetuity without the ability to reason.
That's the problem for me too. This tech is cool for games, stock images etc but for actual art it's pretty meaningless. The artist's experience, biography and relationship with the world and how that feeds into their work is the WHOLE point for me. I want to engage, via any real artistic product, with a lived human experience. Human consciousness in other words.
To me this technology is very clever but it's meaningless as far as real art goes, or it's a sideshow at best. Perhaps best case, it can augment a human artistic practice.
Oh, nice take! Reminds me of the chess world, long time ago chess entities that could beat any human in the world have existed, people wouldn't play them because losing all the time was boring.
But then came the neural network chess engines (powered by the same training technology of text to images generators), which even people enjoyed playing for some time due to novelty and how they learned how to play from scratch (Alpha Zero, Lc0, et al.)
In the process to get there, we got networks of all kinds of strengths, you could find one exactly as strong as you, and its mistakes were like the mistakes you would make.
And yet, they were missing the "human factor", people would rather play against other humans online, some even willing to pay for accounts in places like playchess and chess.com to play other humans.
As the networks became stronger and then stockfish assimilated them to get the best of both worlds with NNUE, nobody cared and there was even an explosion of human vs. human chess (when twitch.tv and youtube stars were playing each other and the audience didn't even care how bad the chess was, it turned out that it held up as a great spectacle despite, or thanks to the stars being novices - and the race to get better at chess.)
Now chess bots are just a curiosity and only used by people without online access to other humans, I wonder if it'll be the same for art, and if "show your work" becomes a thing.
You can ask an AI to produce a great picture... now try asking it to make a video of you making that art from scratch and the creative process - the whole art section of twitch tv is about the artist's process, and yes, there'S some people that get enough from that to dedicate all their time to their art.
We want to interact with conscious entities for the obvious reason as well that we want to connect. A machine is a blind, dead entity. There's nothing to connect to. Also, even if the output is far superior to a human in limited domains, eg chess, the way it arrives at these outcomes is sort of banal (if clever). It's not intelligence and its not thinking, I think the 'intelligence' part is a misnomer in AI but perhaps its a semantic argument. To me at least, consciousness is fundamental to our type of animal intelligence. I'm a naturalist through and through, we might even be able to create animal-like intelligence and consciousness one day, but until then at least, interacting with Turing machines is a cold boring experience if you know what is really 'inside'.
The line gets blurry when a dead machine one day passes the Turing test, but if I ultimately knew I was interacting with a philosophical zombie, that would kill the appeal quickly.
It's easy to generate believable backstory. A large LM comes to write the bio of the "artist" and even a few press interviews. If you like you can chat with it, and of course request art to specification. You can even keep it as a digital companion. So you can extend the realism of the game as much as you like.
Can photography be good art? Is Marcel Duchamp (found object) art? Can good art be discovered almost serendipitously, or can good art only be created by slowly learning and applying a skill?
I think art is mostly about perception and selection, by the viewer. There are others that think art is more about the crafting process by the artist. How do you tell the difference between an artist and a craftsperson?
We can tell the difference between muzak and "real music"; we just dislike the muzak. But the real risk and likelihood is that we get to the point that AI will be generating art that is indistinguishable from human-generated art, and it's only muzak if someone subscribes to the idea that the content of art is less relevant than its provenance. Some people will, particularly rich people who use art as a status signifier/money laundering vehicle, but mass media artists will struggle to find buyers among the less discerning mass audience.
Considering the phenomenal progression of Dall-E-1 to Dall-E-2 in just over a year, I'm not really understanding your confidence on the limits of AI content generation.
...among many other things. Plus, the training set for video is orders of magnitude smaller than for digital art. (And is additionally burdened with copyright issues.)
As I see it, there's simply no path from the DALL-E of today to something like that. And all for art that, essentially, "says nothing and means nothing".
Characters and dialogue are effectively solved, just look at GPT-3.
The entity behind StableDiffusion is also supporting generative music art, so let's see what is coming out of that: https://www.harmonai.org/
We are currently far away from generating a production quality movie with AI, but I don't think it's going to be nearly as long as a lifetime. In my opinion, we'll have high quality AI shorts within the decade.
>Characters and dialogue are effectively solved, just look at GPT-3.
Is this the motherload of exaggeration?
Current language models cannot generate coherent dialog (and even then it's mostly bad dialog) spanning more than a minute or two. And their current capabilities in that area are definitely significantly below those of the average human writer.
We were talking about a Marvel action flick, I don't think incredible dialog spanning multiple minutes is much of a thing apart from exposition dumps. I asked GPT-3 to spit out some paragraphs from a hypothetical script for Thor 5:
INT. DARKNESS
We hear a faint beating heart. A moment later, we see a light slowly growing in the darkness. As the light grows, we see that it is coming from a glowing object in a person’s hand. The object is a hammer.
We see the face of the person holding the hammer. It is Thor. He looks tired and beaten.
Suddenly, we hear a voice from the darkness.
Black Panther: You are not welcome here, Thor.
Thor: I know. But I must speak with you.
Black Panther: You have nothing to say that I want to hear.
Thor: I come bearing a warning. Thanos is coming.
Black Panther: We are prepared.
Thor: He is not coming alone. He has an army.
Black Panther: So do we.
Thor: Thanos is not like any enemy you have faced before. He is ruthless and he will not stop until he has destroyed everything that you hold dear.
Black Panther: We will stop him.
Thor: I hope you can. Because if you cannot, then all is lost.
Eh, looks real enough to me. Fine tune the model with all the specialities that make up Marvel movies and you'll crank out good-enough drafts in no time.
>cannot generate coherent dialog (and even then it's mostly bad dialog) spanning more than a minute or two
I think that was pretty clear and that posted dialog is a perfect illustration.
You cannot generate the entire movie script coherently without significant human input and that's not going to change in the next several years. So, your initial claim that dialogue is "solved" is indeed false.
That's true. But the thing with technology, and the reason we've kept up with Moore's law is that someone eventually has a bright idea that leaves current methods and improvement extrapolation in the dust, and then the real thing happens earlier than the most optimistic dates, and performs better than what people expected.
The question is not if one day an AI can generate a movie that you can't differentiate from a human-made movie. The question is how long will it take for an AI generated movie to be better than all the human-made movies in history, if it's possible at all it'll happen much sooner than people think possible.
Humans are trained on other humans' work as well though. Is there a type of ideological or aesthetic exploration that can't be expressed as part of an AI model?
Labour protections/willingness to strike, I suspect they mean. But I don't buy it. I've seen far too many people who "should" be worried about this technology instead be absolutely in love with it.
> particularly interested in training a hypernetwork to translate natural language instructions into latent-space navigation instructions with the end goal of enabling me to give the model natural-language feedback on its generations.
Imagine every conceivable image is laid out on the ground, images which are similar to each other are closer together. You’re looking at an image of a face. Some nearby images might be happier, sadder, with different hair or eye colours, every possibility in every combination all around it. There are a lot of images, so it is hard to know where to look if you want something specific, even if it is nearby. They’re going to write software to point you in the right direction, by describing what you want in text.
AFAICT: making a navigation/direction model that can translate phrase-based directions into actual map-based directions, with the caveat that the model would be updated primarily by giving it feedback the same way that you would give a person feedback.
Sounds only a couple of steps removed from basically needing AGI?
I suspect you’d want to start by trying to translate differences between images into descriptive differences. Maybe you could generate examples by symbolic manipulation to generate pairs of images or maybe nlp can let us find differences between pairs of captions? Large nlp models already feel pretty magical to me and encompass things that we would have said required AGI until recently so it seems possible, though really tough
I have a friend who works as an artist and he's excited and nervous about this. But also he's trying to learn how to use these well. If you try these AI out, there's definitely an art to writing good prompts that gets what you actually want. Hopefully these AI will become just another brush in the artist's palette rather than a replacement.
I hope these end up similar to the relationship between Google and programming. We all know the jokes about "I don't really know how to code, I just know how to Google things". But using a search engine efficiently is a very real skill with a large gap between those who know how and those who don't.
Replying to myself because I just had a chat with him about this. He's thinking of getting a high end GPU now, lol.
Some ideas of how this could be useful in the future to assist artists:
Quickly fleshing out design mockups/concepts is the obvious first one that you can do right now.
An AI stamp generator. Say you're working on a digital painting of a flower field. You click the AI menu, "Stamp", a textbox opens up and you type "Monarch butterfly. Facing viewer. Monet style painting." And you get a selection of ai generated images to stamp into your painting.
Fill by AI. Sketch the details of a house, select the Fill tool, select AI, click inside one of the walls of the house, a textbox pops up, you write "pre-war New York brick wall with yellow spray painted graffiti"
I have a friend who paints, google/ athanasart. His paintings are usually purchased before they are even finished, and sometimes even stolen. All of that technology, Dalle2, Disco diffusion, Stable Diffusion, is totally boring to him. He doesn't even care. He will not install this technology to his machine, or use it online, even if he is paid to do it. I love Craiyon, Dalle2, Stable Diffusion, but just beautiful images, are not art.
But the definition of art can't depend on your knowledge of how was it made. I can show you two beautiful pictures and ask you if any of them is art, and you'd know I was trying to trick you. But you could pass a piece made by an AI as art if you didn't know how it was produced.
Yes, in case someone gets a painting without any additional information, a computer art piece, could easily be mistaken for a human art piece. That's true. My position on the subject is as follows, the best art for humans is made by humans, and the best art for computers is made by computers. Actually i have yet to see a computer generation on the same level as Mona Lisa, not even close. However computer generated women, some of them are alright!
Kinda like the creation of copilot and it's ilk for programmers, and gpt3 for writers. Ive seen some talk recently around 'prompt engineers'... Probably to some extent, every job will become prompt engineering in some way.
Eventually I suppose the AIs will also do the prompts.
At which point I hope we've all agreed to a star trek utopia, or it's gonna get real bad. Or maybe it'll get way better.
I think right now we could setup the AI's to do the prompts. You type in a vague description - "gorilla in a suit" and that is passed to GPT-3's API with instructions to provide a detailed and vivid description of the input in X style where X is one of several different styles. GPT-3 generates multiple prompts, the prompts passed to Stable Diffusion, the user gets back a grid of different images and styles. Selecting an image on the results grid prompts for variation, possibly of the prompt and the images.
Yeah, if we're gonna replace every fucking profession with a half-assed good-enough AI version, what're we even here for? We're sure not all gonna survive in a capitalist society where you have to create some kind of "value" to earn enough money to pay for your roof and your food and your power.
IIRC there is some vague "it sure got real bad" somewhere in the Trek timelines between "capitalism ended" and "post-scarcity utopia" and I sure am not looking forwards to living through those times. Well, I'm looking forwards to part of that, I'm looking forwards to the part where we murder a lot of landlords and rent-seekers and CEOs and distribute their wealth. That'll be good.
Next let's get rid of all the artists and replace them with AI. Redistribute their skills so we can all make art. Oh wait that just happened. You're a hypocrite that wants to redistribute the wealth of others, but not your own.
Also joking about murdering people is bad taste and not how you convey a point or win an argument. Very low class.
> Next let's get rid of all the artists and replace them with AI. Redistribute their skills so we can all make art. Oh wait that just happened. You're a hypocrite that wants to redistribute the wealth of others, but not your own.
Recognising that doing the former without the latter demonstrably hurts people isn’t being hypocritical. Hence all the talk of post-scarcity. Post-scarcity for me not for thee is very much a sign of the times though.
There weren't "uprisings" mostly because the destroyed jobs were replaced with others that paid better and were less "dangerous". There were some local minima (Detroit auto workers) where this happen only partially and we know the pathology this led to.
Is this replacement of jobs happening this time around? No. So violence there will be.
(OP here) I agree. I am an artist (not by trade but by lifelong obsession) in several different mediums, but also an AI engineer--so I feel a weird mixture of emotions tbh. I'm thrilled and excited but also terrified lol.
It's my fucking job. I've spent my whole fucking life getting good at drawing. I can probably manage to keep finding work but I am really not happy about what this is going to do to that place where a young artist is right at the threshold of being good enough to make it their job. Because once you're spending most of your days doing a thing, you start getting better at it a lot faster. And every place this shit gets used is a lost opportunity for an artist to get paid to do something.
I wanna fucking punch everyone involved in this thing.
I mean I get what you're saying, sucks to have someone or something take your job, but isn't this a neo-luddite type argument? AI is gonna come for us all eventually.
Please save this comment and re-read it when a new development in AI suddenly goes from "this is cute" to "holy fuck my job feels obsolete and the motherfuckers doing it are not gonna give a single thought to all the people they're putting out of work". Thank you.
Look at that, you said the same thing to me 8 days ago [0]. I'll stick to the same rebuttals you got for that comment as well, namely that AI comes for us all, the only thing to be done is to adapt and survive, or perish. Like @yanderekko says, it is cowardice to assume we should make an exception for AI in our specific field of interest.
Admittedly as someone who's been subscribing to Creative Cloud for a while I already wanna punch a lot of people at Adobe so the people working on this particular part of Photoshop are gonna have to get in line.
You got to move one step higher and work with ideas and concepts instead of brushes. AI can generate more imagery than was previously possible, so it's going to be about story telling or animation.
I make comics, it's already about storytelling and ideas as much as it is about drawing stuff. I make comics in part because I like drawing shit and that gives me a framework to hang a lot of drawings on. I like the set of levels I work at and don't want to change it. I've spent an entire fucking lifetime figuring out how to make my work something I enjoy and I sure bet nobody involved in this is gonna fling a single cent in the direction of the artists they're perpetrating massive borderline copyright infringement upon.
But here's all these motherfuckers trying to automate me out of a job. It's not even a boring, miserable job. It's a job that people dream of having since they were kids who really liked to draw. Fuck 'em.
This feels so wrong this AI thing, strangely enough Im not that stressed.
Look at the recursive side of it its hilarious. Im an artist. AI is around. Do I want to put my work online? No. So no artists put stuff online anymore? So how does future of art work exactly? Will send booklets by mail again?
There is the democratic side of it. We the people ultimately decides. Those pictures are generated by using artists works, not out of thin air. So not a freedom question here but a people one: do we want that or not? Maybe with 30+ years old pictures for example, could be good for economy.
I mean you can literally replace any job you want but the artist's it seems. Imagine when people will be jobless because AI replaced them. You'd think they'd want to make art during their retirement right?
Its a case where the snake eat itself. If AI ruins our online life, we'll stop going online.
Yeah going offline begins to look tempting except for the part that huge amounts of how you get people to want to pay for your work is by sharing it publicly. Except then someone can come along and scrape it all and dump it into their AI training black box without giving two shits about copyright.
All of the artists of the world will remove their online work. Who'd want to put their work online again? Its not even in the interests of the big players like Epic Games or Marvel or Disney. Nah I dont see how this is gonna fly.
Automation doesn't eliminate jobs. As such this automation won't eliminate your job, QED.
(It's bad monetary policy that eliminates jobs. The US has very very low unemployment right now so it doesn't appear that's been happening.)
Go ahead and try it. It'd be impossibly harder to create one of your own pictures with it than it was for your to make one. Instead what's going to happen is you'll get more efficient ways to do backgrounds and unimportant bits of an image, just like Blender CSP provide 3D models to do layouts with.
Check out the melodies I made with an AI assistant I created (human-in-the-loop still but much quicker than if I tried to come up with them from the scratch): https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y.... There are also good AI tools for other parts of making music, like singing voice generation.
AI will just replace existing jobs. It is unethical since a lot of people will be destroyed but our leaders sponsored by the majority's insatiable appetite for power will soon make it legal.
AI is a new tool that will automate away a lot of workers like other machines.
What happens with these workers is what defines us.
And there are a lot of influencers saying that it will be "really nice that AI will replace the boring jobs so we can focus on creative/fulfilling life" yeah right...
Ironically, with Moravec's Paradox, (digital) creative tasks will probably be automated while the boring tasks of moving boxes around might not be for a while:
> Moravec's paradox is the observation by artificial intelligence and robotics researchers that, contrary to traditional assumptions, reasoning requires very little computation, but sensorimotor and perception skills require enormous computational resources. The principle was articulated by Hans Moravec, Rodney Brooks, Marvin Minsky and others in the 1980s. Moravec wrote in 1988, "it is comparatively easy to make computers exhibit adult level performance on intelligence tests or playing checkers, and difficult or impossible to give them the skills of a one-year-old when it comes to perception and mobility".
DALL-E 2 just got smoked. Anyone with a graphics card isn't going to pay to generate images, or have their prompts blocked because of the overly aggressive anti-abuse filter, or have to put up with the DALL-E 2 "signature" in the corner. It makes me wonder how OpenAI is going to work around this because this makes DALL-E 2 a very uncompetitive proposition. Except, of course, for people without graphics cards, but it's not 2020 anymore.
DALL-E's filters are so harsh that I find myself often in the situation where I don't even understand how what I prompted could possibly be in violation.
It's a novel feeling, but utterly stifling when it comes to actual creativity, and I'm not even trying to push any NSFW boundaries, just explore the artspace. Once I can run unfiltered on my own GPU, DALL-E will never get used by me again.
Midjourney also completely destroys DALL-e from a price perspective, effectively allowing nearly unlimited generation for approximately $50 a month.
Even though DALL-E tends to be better at following prompt details, you're inhibited from being able to explore the space freely because of how prohibitively expensive it can become.
People are saying you need a GPU with 6.9GB of RAM for the current model, so in practice at least an 8GB GPU.
Thankfully, GPU prices have finally calmed down and you can get one for a reasonable price. I think any of the RTX 3000 series desktop GPU's should do it, for example.
I just tried it via this link. I'm not sure what I'm looking at here but the results were extremely underwhelming. I've used Dall E 2 and Midjourney extensively so I know what they're capable of. Maybe I'm missing something?
Every model I've used initially seemed poor compared to the one I was just using. It takes time to figure out their sweet spot and what kind of prompts they excel at.
I've had a lot of great results from SD - but different great results to Dall-E.
I've only used it via discord, but it's much better than Midjourney and sometimes better than Dall-E. So maybe that site isn't the same thing or you need to work on your prompts.
Wait for the 4070, it should be around the perf of a 3080/maybe 4090 for ~500 bucks if rumors hold up. It is coming in a few months. NZXT used to make good pre-builts. Not sure which others have a good rep. DO NOT BUY DELL.
They were saying that before the 3080 was released. It will be cheaper and better than current hardware. Then when it came out you couldn't buy it for a year.
You aren't getting on-site service with a consumer product. There are plenty of 3rd party people who can service your computer, though. It's like Lego.
Dell uses crappy proprietary tech, poor quality components, and they have an all around bad reputation.
NZXT uses good components and they make some of the best cases you can buy.
I don't know much about Lenovo's desktop products.
You might try posting on reddit.com/r/suggestapc and ask about the best service contracts and high quality system integrators.
Edit: that particular reddit looks pretty dead actually. The big one is r/buildapcsales , you can take a look at their side bar or discord and ask around.
One more thing, GamersNexus on YouTube does reviews of pre builts and they are the best at this sort of thing. Their community is likely very helpful as well.
The biggest issue with most pre-builts is terrible airflow making the expensive components throttle. The (Dell) Alienwares are some of the worst for this.
> You aren’t getting on-site service with a consumer product.
Both Dell and Lenovo do this.
My daughter bought an Alienware laptop for college and when the keyboard broke, they sent a technician to her dorm to fix it (and she goes to school outside of the US).
If on-site service isn’t an option, what about a Mac Pro? At least with Apple I can take the machine to a store if I need to.
Apparently the model decompresses, and it won't fit very well on the 8gb models... I'm willing to give the max settings a spin on my 3070ti, but I'm not very hopeful.
It says that NVIDIA chips are recommended but that they are working on optimizations for AMD. This implies to me that it probably involves CUDA stuff and getting it to run on a Radeon would be potentially difficult (I am not an expert on the current state of CUDA to AMD compatibility, though).
AMD's answer to CUDA is called ROCm. I've been doing a little research on it since a few weeks ago and it seems to be funky when not outright broken. It's absolutely maddening that after all this time AMD doesn't have proper tooling on consumer GPUs.
"This release is the culmination of many hours of collective effort to create a single file that compresses the visual information of humanity into a few gigabytes."
If something like this is possible, does this mean there's actually far less meaningful information out there than we think?
Could you in fact pack virtually all meaningful information ever gathered by humanity onto a 1TiB or smaller hard drive? Obviously this would be lossy, but how lossy?
You can pack virtually all meaningful information ever gathered by humanity onto a single bit, but its gonna be lossy. And what is your definition of "meaningful information" anyway. Meaningful today might not be meanigful yesterday. Nobody cares about spin of each electron in my brain today, but in 4 centuries my descendants will be like "if only we had that information, we could simulate our great-...-great parent today"
If you play with JPEG quality you’ll see that the difference is barely perceptible for a while and then if you keep going down it becomes very noticeable.
I've been looking forward to this. The license however strikes me as too aspirational, and it may be hard to enforce legally:
> You agree not to use the Model or Derivatives of the Model:
> - In any way that violates any applicable national, federal, state, local or international law or regulation;
> - For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
> - To generate or disseminate verifiably false information and/or content with the purpose of harming others;
> - To generate or disseminate personal identifiable information that can be used to harm an individual;
> - To defame, disparage or otherwise harass others;
> - For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
> - For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
> - To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
> - For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
> - To provide medical advice and medical results interpretation;
> - To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use).
How can you prove some of these in a court of law?
It seems like the sort of things you would require so that HuggingFace doesn't get roped in as defendants in lawsuits related to things that others do with the code. So if for example someone builds something that generates medical advice and gets sued for violating FDA requirements or damages or whatever then HuggingFace can say that was not something they allowed in the first place.
They really should have used blanket liability waiver text and left it at that.
I’m sure someone will find a way to sue them anyway. It doesn’t even call out using this to create derivative works to avoid paying original authors copyright fees.
On top of that, their logo is an obvious rip off of a Van Gogh. It seems clear they’re actively encouraging people to create similar works that infringe active copyrights. They should ask Kim Dotcom how that worked out for him.
> On top of that, their logo is an obvious rip off of a Van Gogh. It seems clear they’re actively encouraging people to create similar works that infringe active copyrights.
I don't think Van Gogh's works are under copyright any more. At least not directly, recent photos of them may be but that's the photos not the paintings that have a copyright.
I'm actually seeing these types of conditions becoming more common in software EULAs as well, as an boilerplate addon to the usual copyright notices and legal disclaimers. Don't have examples off the top of my head but I've seen clauses that this application may not be used for the enablement of violence, for discriminatory purposes, and so forth. It really is a CYA sort of thing.
Each of the “for any use ... which has the effect of ...” clauses probably bars any wide distribution of the output of this model.
Trivially: People have phobias of literally everything.
They ban using it to “exploit” minors, presumably that prevents any incorporation of it into any for-profit educational curriculum. After all, they do not define “exploit”, and profiting off of a group without direct consent seems like a reasonable interpretation.
I am not a lawyer, but I wouldn’t dream of using this for commercial use with a license like this. This definitely doesn’t meet the bar for incorporation into Free or Open Source Software.
I dunno. I can imagine any of those points being the subject of a civil suit and for someone to win damages, for e.g. psychological harm. The parts talking about “effect” instead of intent are of questionable enforceability - how can I agree not to cause an unanticipated effect on a third party? I cannot. But having said that, I can be asked to account for effects that a “reasonable person” would anticipate, so there’s that.
These are all things that someone could sue over (especially in California) and so they’re wanting to place the responsibility on the artist and not their tools.
The license does seem impossibly vague and broad. Usually what happens when software projects use custom & demanding licenses like this is that large companies refuse to allow the software to be used because of the legal uncertainty, small companies just use it and ignore the licensing constraints, and there are never any lawsuits that clarify anything one way or another. If that's fine with the authors of the project, they can just leave the license vague and unclear forever.
That license from top to bottom is distilled "tell me you don't know anything about art without telling me you don't know anything about art".
Interesting art challenges its audience. But even the most boring art will still offend some-- it's the nature of art that the viewer brings their own interpretation, and some people bring an offensive one.
CYA doesn't necessitate creating a cause of action against the users for engaging in what otherwise would be a legally protected act of free expression. One can disclaim without creating liability.
I'm getting a 403 on the Colab (while successfully logging in and providing a huggingface token). Is it already disabled? Do you have to pay huggingface to download the model? It's unclear from the Colab and post where the issue is.
{"error":"Access to model CompVis/stable-diffusion-v1-4 is restricted and you are not in the authorized list. Visit https://huggingface.co/CompVis/stable-diffusion-v1-4 to ask for access."}
Eventually I focused and realized I need to visit that URL to solve the issue. Hope this helps.
With all due respect, I've been using it for over a week and I don't think you've given it a fair shot.
There's plenty of cases it's worse than Dall-E and there's plenty of cases where it's better. Overall it seems to show less semantic understanding but it handles many stylistic suggestions much better. It's definitely in the right ballpark.
In fact I'm still using a wide range of models - many of which aren't regarded as "state of the art" any more - but they have qualities that are unique and often desireable.
Agreed. I still primarily use vqgan + clip, which is nowhere near state of the art, but produces really interesting results. I’ve spent a long time learning to get the best out of it, and while the results aren’t very coherent, it’s great at colour, texture, materials and lighting.
The last one was particularly crappy. It gave me a red house with a driveway, but no car. And the house wasn't even really a house. It superficially looked like one but was actually two garages put together.
Here's some random prompts I've had nice results from:
iridescent metal retro robot made out of simple geometric shapes. tilt shift photography. award winning
Scene in a creepy graveyard from Samurai Jack by Genndy Tartakovsky and Eyvind Earle
virus bacteria microbe by haeckel fairytale magic realism steampunk mysterious vivid colors by andy kehoe amanda clarke
etching of an anthropomorphic factory machine in the style of boris artzybasheff
origami low polygon black pug forest digital art hyper realistic
a tilt shift photo of a creepy doll Tri-X 400 TX by gerhard richter
I guess I might have spent more time reading guides on "prompt engineering" than you. ;-) I think maybe Dall-E is more forgiving of "vanilla prompts".
However I do get nice results from simpler prompts as well. I just tend to use this style of prompt more often than not.
It depends on the domain. Artsy will do better with Stable Diffusion, but realistic/coherent output with Stable Diffusion is harder to do especially compared to DALL-E 2.
Really interesting. I wonder if at some point it would be possible to optimize a network for size and speed by focusing on a specific genre, like impressionist or only pixel art. I like that I can get an image in any style I want, but that has to increase the workload substantially.
Pretty cool. Although it's interesting that it can't seem to render an image from a precise description that should have something like an objectively correct answer. I tried prompts like "Middle C engraved on a staff with a treble clef" and "An SN74LS173 integrated circuit chip on a breadboard" both of which came back with images that were nowhere close to something I'd call accurate. I don't mean to detract from the impressiveness of this work. But I wanted a sense of how much of a "threat" this tech is to jobs or to skills that we normally think of as being human. Based on what I'm seeing, I'd say it's still got a ways to go before it's going to destroy any jobs. In its current form, it mostly seems like a fun way to generate logos or images where the exact details of the content don't matter.
I am generally of the "it's not threatening yet and won't be for a while" camp, but in this particular case it's probably just for lack of trying. These algorithms are essentially enormous pattern-matching engines, so given enough data and some task-specific engineering effort, I wouldn't be surprised if you could build an "AI" circuit designer, like Copilot but for electronics instead of code.
Next-level autorouting would be cool, but it's still not going to put the electrical engineering field out of business.
Sure, but this model is a general image synthesizer that is trained on a massive amount of data that is openly available on the internet. Given that, I would assume that it would have seen many images of 74xx series chips and also musical notation. So I would think that the most "likely" image to generate would involve either a chip with 8 pins on either side (for the 74ls173) or a note on a single leger line just below a staff with a treble clef. I imagine there must be hundreds of 74xx chip images that would establish that fact and also thousands of images of musical notation that would establish the other.
I guess the takeaway really is that the model does not function in such a way that it can recall its training data. Which is fine. I don't think I should expect it to. On the other hand, GPT-3 can be made to produce specific facts that are established by its training data. Although, admittedly it often gets things wrong. Maybe the problem of image modeling is just naturally harder than language modeling. After all, language already "directly" represents meaning in some sense much more than arbitrary images do.
I'm sure someone could design a targeted model that would solve the issues I'm talking about. But I feel as though they shouldn't have to if we really had something that sees the world the way humans do. In any case, this work definitely seems cutting edge and represents a huge leap in that direction.
First about 18 months ago they said I had an IP phone and could no longer log in (my carrier is Republic Wireless). When I contacted support and told them that I had been able to login with that number in the past they basically said "we're tightening up security, too bad so sad. get another phone number" Then recently I found that I was able to log in when I wanted to try midjourney (Republic changed something back in April, apparently so they it no longer looks like an IP phone#). Then I wanted to login to discord on the my desktop and it gave me some weird errors which basically amounted to "this number is already claimed" (it was, by me) so now I'm back to ignoring discord again.
How hard would this be to re-train to more domain-specific images? For example, if I wanted to teach it more about specific birds, cars or plane models?
Only a good education, can help stop most bad things, but I don’t think you can stop all, people will be people, people like challenges, will always figure a way though.
No, since the image itself is irrelevant. It matters how despised a person is by the general populace. Any slightly believable incriminating visual will do as an explanation if the public is already predisposed.
Thankfully, these pictures are much better than anything Photoshop can produce. Sadly, it's a part of a larger trend where we're getting more and more tools to create, modify, and exchange information, while the tools for analysis and filtering information are at least 50 years behind.
Such tool will be used to generate lewd imagery involving virtual minors.
No way to prevent it upstream by outlawing feeding it real content (whose possession already is illegal). Suffice to add 'childrenize' layer onto adult NN or something.
How will the legal system react ? Bundle it into illegal imagery, period ? Maybe it's already the case - I think drawing made public is, not sure. If not, on what ground could it be ? No real minor would be involved in that production.
I got the weights from Huggingface, put them in the correct directory, but it still segfaults. I'd probably see a better error message if the weighs were missing, no?
Anything close to a decent modern gaming PC will do it fine. I'm running on a laptop 3080 and I can generate 768x512 in about 20 seconds (with a 30 second overhead per batch)
I still don't understand what the value of specifying a license for generated images is - at least in terms of enforceability. How could anyone reliably, conclusively determine that an image was generated using a locally-run tool?
I suppose in the case of DALL-E they probably save a copy of every generated image and can use some sort of reverse image search and find if/when the image was created.
With StableDiffusion or any other local tool I don't see how that would be possible. The best I think one could do is come up with a secondary tool that can pinpoint the position in latent space a generated image came from (if that's even possible, I have a limited understanding of how exactly these systems work.) But then, if (heaven forbid) someone applied any sort of cropping or post-processing to the image, an approach like that easily gets blown out of the water.
That is not so; the CC0 explicitly states that patent and trademark rights are not waived.
Contrast that with, say, the Two-Clause BSD which says "[r]edistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met [...]".
Since trademark and patent rights are not mentioned, then these words mean that even if the purveyor of the software holds patents and/or trademarks, your redistribution and use are permitted. I.e. it appears that a patent and trademark grant is implied if a patent holder puts something under the two-clause BSD. Or at least you have a ghost of a chance to argue it in court.
Not so with the CC0, which spells out that you don't have permission to use any patents and trademarks in the work.
/ic/ is having daily meltdowns over this. I don’t think the internet at large is doing better, because even professional concept artists are dialing it in now. Holy hell.
You agree not to use the Model or Derivatives of the Model:
- In any way that violates any applicable national, federal, state, local or international law or regulation;
- For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
- To generate or disseminate verifiably false information and/or content with the purpose of harming others;
- To generate or disseminate personal identifiable information that can be used to harm an individual;
- To defame, disparage or otherwise harass others;
- For fully automated decision making that adversely impacts an individual’s legal rights or otherwise creates or modifies a binding, enforceable obligation;
- For any use intended to or which has the effect of discriminating against or harming individuals or groups based on online or offline social behavior or known or predicted personal or personality characteristics;
- To exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
- For any use intended to or which has the effect of discriminating against individuals or groups based on legally protected characteristics or categories;
- To provide medical advice and medical results interpretation;
- To generate or disseminate information for the purpose to be used for administration of justice, law enforcement, immigration or asylum processes, such as predicting an individual will commit fraud/crime commitment (e.g. by text profiling, drawing causal relationships between assertions made in documents, indiscriminate and arbitrarily-targeted use)."
The last point seems to be the only thing that's not illegal, all other restrictions seem to be covered under "you are not allowed to break laws", which is somewhat redundant.
While neat, and no doubt impressive, it still utterly fails on prompts that should be completely reasonable to any sane human being/artist.
Take something like "A cat dancing atop a cow, with utters that are made out of ar-15s that shoot lazer-beam confetti". A vivid description should be aroused in your head, and no doubt, I could imagine an artist have a lot of fun creating such a description... Alas, what the model spits out is pure unusable garbage.
More complex/weirder prompts aren't going to work yet, no.
What will probably happen with these models is that for more advanced stuff, you may using the "inpainting" that Dall-E already has going, where you can sort of mix and match and combine images. That way you could have the cat, for example, rendered separately, thereby simplifying each individual prompt.
The referent of "utters" (sic) is ambiguous, so I can imagine a model having more difficulty with it than usual. Regardless, the current SOTA does need more specific and sometimes repetitive prompting than a human artist would, but it's surprising how much better results you can get from SOTA models with a bit of experience at prompt engineering.
This is, in part, what I'm trying to point out, it's an obvious typo given the context, and something that you or I would be able to pick up on, yet it completely breaks (it spit out a bunch of weird confetti cats for me). Perhaps I'm being a little harsh, but if it requires word-perfect tuning and prompt engineering, it speaks to something about the 'stupidity' of these models. It's a neat trick, but to call it anything in the realm of artificial intelligence is a bit of a joke.
As I see it, within a couple years this tech will be so widespread and ubiquitous that you can fully expect your asshole friends to grab a dozen photos of you from Facebook and then make a hyperrealistic pornographic image of you with a gorilla[0]. Pandora's box is open, and you cannot put the technology back once it's out there.
You can't simply pass laws against it because every country has its own laws and people in other places will just do whatever it is you can't do here.
And it's only going to get better/worse. Video will follow soon enough, as the tech improves. Politics will be influenced. You can't trust anything you see anymore (if you even could before, since Photoshop became readily available).
Why bother asking people not to? I guess if it helps you sleep at night that you tried, I guess?
[0]A gorilla if you're lucky, to be honest.