I gave this and other available applications a try and I don’t understand what people see in ai image generation.
A simple prompt generated a person with 3 nostrils, 7 squished fingers, deformities everywhere I look, it just mashes a bunch of photographs together and generates abominations.
Pay close attention to generated models and you will find details which are simply wrong.
Early cars were terrible too, but here we are. The promise is that future versions of the technology will be able to draw anatomically correct people and images. A computer program that can do in mere minutes what takes a person hours. If you've never wanted a picture of something you can describe but aren't able to draw in your life, then there is no use case for you. For anyone else that's interacted with the world of art and graphic designers or used stock photos; this goes an order of magnitude faster, and is basically free, compared to hiring a skilled professional for hours. It's a game changer for an industry that it sounds like you've just never interacted with.
They were. They were loud and stinky and were unsuitable for dirt roads, spooking horses, causing the UK to basically ban them. Some were powered by steam or coal but those that were powered by gas had a different problem - there were no gas stations. You had to hand crank them to start. Moving goods and people around was already a solved problem with horses and trains and boats.
Cars then: take enormous energy to move very little, and slowly. Main use case then was as a rich person's toy (entertainment). They'll never replace work horses with them.
It's easy, in hindsight, to see cars as inevitable. But you had to see past the shortcomings of the earliest cars to "get it", much like you have to see past the 3 armed monstrosities that current image generation techniques produce and see the promise of the technology. There were undoubtedly those who saw cars as hype, much like image generation is seen today; I'm sure buggy whip manufacturers saw cars as hype and refused to get on what looked like a hype train to them.
I can't speak for others, but I've personally been quite impressed by the dalle output. It creates things that would take me hours (if not days) to create, which no other tech I've tried has been able to generate. It feels like it can absolutely replace at least the stock photo industry. It's also terrific for things like blog photos if you don't have the time or talent to create something yourself, but want some creative control.
Expansions like dream booth, which let you fine tune the system with your own submitted images are also quite amazing. Being able to give it just a few photos, and say things like "show me surfing in the ocean" and get a reasonable image back.
_Much_ more broadly, this space in AI/ML with GPT3/Dalle is exciting because it feels kind of like what the internet was made for. There's too much data on the internet for any one person to ever meaningfully process. But a machine can. And you can ask that machine questions. And instead of getting just a list of references back, you get an "answer". Image generation is the "image answer" part of this system. It's an exciting space because it feels like these systems will affect large chunks of how we use computers.
The 3 nostrils, 7 squished fingers are not that big of a problem, you can run other image enhancing AIs on top of the generated images to fix that, or just use inpainting and give it a few more tries to get it right. The models are also slowly getting better at it.
> What is the use case that I’m missing?
It's generating images from nothing more than a text description, a year ago that was something you'd only saw an StarTrek. Now it's real and we have barely scratched the surface of what is possible.
The images still need some manual work, but try to generate images of that quality and complexity by hand and you might have more appreciation how mindblowing it is that AI can not only do it, but do it in seconds.
Already on some of the homegrown models (https://rentry.co/sdmodels) these things are fixed already. For the Stable Diffusion "enthusiasts" the tools and models have improved at least 100% since the original release.
It's more of a cool technology that is rapidly advancing. A couple years ago, it couldn't do this much. A couple years from now, it will be much better. It does much more than mash images together, which you would know if you dug into it a bit. That's it, that's the whole thing.
There needs to be some sort of piece or filter that understands body geometries and inverse kinematics to prevent things like generating people with 3 limbs or joints in positions that would not normally be feasible without injury =). It'll come.
Nothing. It's IMHO just a hype of the younger nerdy generation. The real world applications of NN-based (there is no I in A) image generation are limited.
One hype comes the other hype goes. IMHO it did not come to stay ;-)
I haven't been able to get any good results with Stable Diffusion (via DiffusionBee on my M1 MacBook Air), but I've seen really good images of other AI generators like Midjourney.
I gave this and other available applications a try and I don’t understand what people see in ai image generation.
A simple prompt generated a person with 3 nostrils, 7 squished fingers, deformities everywhere I look, it just mashes a bunch of photographs together and generates abominations.
Pay close attention to generated models and you will find details which are simply wrong.
What is the use case that I’m missing?