> Insisting on the bare minimum human involvement as a feature is just a non starter for me if something is presented as art
You can make the guidance as superficial or detailed as you like. Input detailed descriptions, use real images as reference, you can spend a minute or a day on it. If you prompt "cute dog" you should expect generic outputs. If you write half a screen with detailed instructions, you can expect it to be mostly your contribution. It's the old "you're holding it wrong" problem.
BTW, try to input an image in chatGPT or Claude and ask for a description, you will be amazed how detailed it can get.
You need an image for an ad. You write a brief and send it to an artist who follows your brief and makes the image for you. You make more detailed briefs, or you make generic briefs. You receive an image. Regardless, did you make that image or just get a response to your brief?
You want a painting of your dog. You send the painter dozens of photos of your dog. You describe your dog in rapturous, incredible, detail. You receive a painting in response. Did you make that painting? Were you the artist in any normal parlance?
When you use chatGPT or Claude you're signing up to getting/receiving the image generated as a response to your prompt, not creating that image. You're involvement is always lessened.
You might claim you made that image, but then you would be like a company claiming they made the response to their brief, or the dog owner insisting they were the painter, which everyone would consider nonsensical if not plain wrong. Are they collaborators? Maybe. But the degree of collaboration in making the image is very very small.
> Did you make that painting? Were you the artist in any normal parlance?
The symphony conductor just waves her hands reading the score, does she make music? The orchestra makes all the sounds. She just prompts them. Same for movie director.
The analogy isn't quite right. The conductor and director spend days collaborating with the symphony and the actors/crew. Parent's example is them literally prompting - via a creative brief - the artist or agency.
The symphony conductor gets credit for being the conductor-- not for being Beethoven. A film director has a thousand times more influence on their final product than a conductor has on theirs, and they still don't try to take credit for the writing, costume, set design, acting, score, special effects, etc. etc. etc. I've yet to see stable diffusion spit out a list of credits after generating an image.
It's still very different. What you describe is exactly what an art director does, which is creative and difficult— there's a good reason many commercial artists end their careers as art directors but none start there. Anybody that says making things that look good and interesting using generative AI is easy or doesn't require genuine creativity is just being a naysayer. However, at most, the art director is credited with the compilation of other people's work. In no situation would they claim authorship over any of the pieces that other people made no matter how much influence they had on them. This distinction might seem like a paperwork difference to people outside of the process, but it's not. Every stroke of the pen or stylus or brush, scissor snip, or pixel pushed is specifically informed by that artist's unique perspective based on their experience, internal state, minute physical differences, and any number of other non-quantifiable factors; there's no way even an identical twin that went to the same school and had the same work experience would have done it exactly the same way with the same outcome. Even using tools like Photoshop, which in professional blank-canvas art creation context use little to no automation (compared to finishing work for photography and such that use more of it.) And furthermore, you can almost guarantee that there's enough consistency in their distinctions that a knowledgeable observer could consistently tell which one made which piece. That's an artistic perspective— it's what makes a piece that artist's own piece. It's what makes something someone's take on the mona lisa rather than a forgery (or, copy I guess if they weren't trying to hide it) of the mona lisa. It's also what NN image generators take from artists. Artists don't learn how to do that— they learn broad techniques— their perspective is their humanity showing through in that process. That's what makes NN image generators learning process different from humans, and why it's can make a polaroid look like a Picasso in his synthetic cubist phase but gets confused about the upper limit for human limb counts. I think generative AI could be used to make statements with visual language, closer to design than art. I definitely think it could be used to make art by making images and then physically or digitally cutting pieces out and assembling them. But no matter how detailed you get in those prompts, there aren't enough words to express real artistic perspective and no matter what, your still working with other people's borrowed humanity usefully pureed and reformed by a machine. These tools are fundamentally completely different than tools like Photoshop. In art school I worked with both physical media and electronic media and the fundamental processes are exactly the same. Things like typography in graphic design are much easier, but you're still doing the same exact process and reasoning about the same exact things on a computer that you do working on paper and sending it to a "paste up man," as they did until the 80s/90s. People aren't just being sour pusses about this amazing new art tool— it's taking and reselling their humanity. I actually think these image generators are super neat — I use them to make more boards and references all the time. But no matter how specific I get with those prompts, I didn't make any of that. I asked a computer and that computer made it for me out of other people's art. A lot of people who are taken by their newfound ability to make polished images on command refuse to believe it, but it's true. It's a fundamentally different activity.
> your still working with other people's borrowed humanity usefully pureed and reformed by a machine
Exactly, isn't it amazing? You can travel the latent space of human culture in any direction. It's an endless mirror house where you can explore. I find it an inspiring experience, it's like a microscope that allows zooming into anything.
Sure it's a lot of fun. I also find it very useful for some things like references and mood boards. No matter how granular you get with control nets or LORAs and how good the models get, you just can't get the specificity needed for professional work and the forms it gives you are just too onerous to mold into a useful shape using professional tools. It's still, fundamentally, asking another thing to make it for you, like work for hire or a commission. Software like Nuke's copycat tool or Adobe's background remover or content-aware fill were professionally useful right off the bat because they were designed for professional use cases. Even then, text prompt image generators are more useful than not in low-effort, high-volume use cases where the extremely granular per-pixel nuance doesn't really matter. I doubt they'll ever be useful enough for anything higher-level than that. It's just fundamentally the wrong interface for this work. It's like saying a bus driver on a specific route with a bus is equally useful to a cab driver with a cab. There are obviously instances where that's true, but no matter how many great things you can show are on that bus route, and no matter how many people it's perfectly suited for, there's just no way a FedEx driver could use it to replace their van.
Just keying on one comment here, which perhaps no one will read:
I was, in fact, a paste-up man in the early 1990s, slapping together copy and ads for a magazine. As such, I was a ping-pong ball in the battle between account management and creative arts - each of them wanted to be the originator of the big and clever ideas. (This is pretty widespread in the industry, and was even a recurring theme in "Mad Men.")
The takeaway here is, people like to be creative. People need to be creative. There will always be an implacable drive to create, one which DALL-E can never satisfy. Gen AI is the artificial sweetener that might temporarily satisfy those cravings, but ultimately artists want to create something from nothing. There's some hope to be found in that, amid the tsunami of AI slop.
Well I really hope that you were easily able to transition out of paste-up because it kind of blows me away how quickly that whole craft just got clobbered. Just like my uncle that specialized in atlas publishing-- luckily he was able to hang on long enough to retire.
I agree that people do want to be creative, and I don't think that people are going to let Gen AI supplant that for them. However, the lower-end of the creative markets doing the low-end high-volume work-- think folks shotgunning out template-based logos on Fiverr-- are the ones that have already been displaced in large numbers, and there are far more of them. While they generally don't have the right skillset to do the higher-end work, their seeing that as the only viable career move is majorly fucking up companies' ability to find workers and vice versa, and for employers that don't know any better, they think the market is saturated which is bringing down wages.
Also, clueless executives just don't realize that having a neural network generate a "80% right" version of your work in a flat PNG file will take more effort to mold into shape for higher-end work than starting from scratch, so they've been making big cuts. A coworker on a contract also works in an animation house that fired their entire concept art department and replaced them with prompt monkeys making half as much money-- the problem was that standard art director changes-- e.g. I want this same exact image and garment, just make those lapels look a little fuller and softer but with sharper angles at the end, and change the piping on that jacket from green to purple-- might have been half an afternoon for a professional concept artist but would be DAYS of work to get art-director right using neural network tools... if for no other reason that the prompt writers just don't have the traditional visual art sophistication to even realize when they've got an appropriate solution, because learning that is a lot harder than learning to draw, and you learn that when you learn how to draw. So all the time they saved on the initial illustration was totally sucked up by art directors not being able to iterate even a tenth as quickly as they used to, and fast iteration was the major selling point for Gen AI to begin with. It simply does not do the task if you absolutely require specificity, and having a raster non-layered png that looks like it already went through post is a beast to edit, even for a skilled post-prod person. Well, three months later, they canned the prompt engineers and were begging their concept artists to come back and work for them again. What a waste of everything.
Why do I even bother torturing myself in forums like this by giving a real-world creative industry counterpoint to the tech crowd perspective, despite many of the most vocal ones being smug, patronizing, and self-aggrandizing? Maybe one executive out there will read this stuff and say "Hmm... maybe I should actually talk to people that work in this field that I trust to see if it's really beneficial to replace our [insert creative department] rather than relying on software execs and their marketing people say is feasible."
You can make the guidance as superficial or detailed as you like. Input detailed descriptions, use real images as reference, you can spend a minute or a day on it. If you prompt "cute dog" you should expect generic outputs. If you write half a screen with detailed instructions, you can expect it to be mostly your contribution. It's the old "you're holding it wrong" problem.
BTW, try to input an image in chatGPT or Claude and ask for a description, you will be amazed how detailed it can get.