Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m confused why there is so much focus on text to images and models. If you spent five minutes talking to anyone with artistic ability, they would tell you that this is not how they generate their work. Making images involves entirely different parts of reasoning than that for speech and language. We seem to be building an entirely faulty model of image generation (outside of things like ControlNet) on the premise that text and images are equivalent, solely because that’s the training data we have.


Can you share some of what you have found about the creative process by talking to people with artistic ability ?

What are your ideas about the differences between a human and AI's creative process ?

Are there any similarities, or analagous processes ?

Do you think creators have an kind of latent space where different concepts are inspired by multi-modal inputs ( what sparks inspiration ? e.g. sometimes music or a mood inspires a picture ) and then the creators make different versions of their idea by combining different amounts of different concepts ?

I am not being snarky, I am genuinely interested in views comparing human an AI's creative processes.


I used to work as an illustrator. Most images appeared to me as somewhere between fuzzy or clear image concepts, unaccompanied by any words. I then have to take these concepts and translate them using principles of design, color, composition, abstraction etc., such that they’re coherent and understandable to others.

Most illustration briefs are also not wrote descriptions of images because people are remarkably bad at describing what they want in an image, beyond in the most general sense of its subject. This is why you see DALLE doing all kinds of prompt elaboration on user inputs to generate “good” images. Typically, the illustrator is given the work to be illustrated (e.g. an editorial), distills key concepts from the work and translates these into various visual analogues, such as archetypes, metaphors and themes. Depending on the subject, one may have to include reference images or other work in a particular style, if the client has something specific in mind.


Project briefs to an artist typically contain both text and reference images. Image diffusion models and the like likewise typically use a text prompt together with optional reference images.


Project briefs are generally not descriptions of images and reference tends to be more style than content focused. Source: I used to be an illustrator for major media outlets like the NYTimes, etc.


So how is the desired content communicated to the artist?

(Also, reference images can absolutely be used to communicate style to a diffusion model)


The artist is given the content to be illustrated, extrapolates themes and overarching rhetorical or narrative aspects of the work, creates visual representations or metaphors corresponding to these aspects, generates 3-5 interpretations, shows them to the AD, who provides feedback on what has been extrapolated, as well as various design considerations.


"The artist is given the content to be illustrated" and the content is in what form?


Not even wrong, in the Pauli sense: to engage requires ceding the incorrect premises that image models only accept text as input and that the generation process relies on this text


Text prompts aren't an essential part of this technology. They're being used as the interface to generation APIs because it's easy to build, easy to moderate, and for the discord models like Midjourney it's easy for people to copy your work.

With a local model you can find latent space coordinates any way you want and patch the pixel generation model any way you want too. (the above are usually called textual inversion and LoRAs.)

I would personally like to see a system that can input and output layers instead of a single combined image.


It’s good for stock images.

And for in-painting I think you’ll find text-to-image is still useful to artists. It’s extra metadata to guide the generation of a small portion of the final image.


Not sure what these cars are all about. Everyone travels by horse and buggy…

We’re building a model optimized for the machine, not people.

Artists can go collect clay to sculpt and flowers to convert to paint. Computers are their own context and should not be romantically anthropomorphized

In the same way fewer and fewer people go to church, fewer and fewer will see the nostalgia in being a data entry worker all day. Society didn’t stop when we all got our first beige box.


This is an incredibly dull, unthinking regurgitation of the nonsense people say online to feel better about their own lack of creative ability. My point wasn’t that computers can’t do the same thing as artists (they already can), it’s that computers won’t achieve the same result by having people describe the images they want to see because that’s fundamentally not how images are made or even perceived.


I play three instruments, draw, sculpt, and used to build houses.

No one ever set a goal for AI to achieve the same result; just replace labor.

Your post is the same dull strawman non-engineers (I also have a BSc in engineering and MSc in math) repeat about AI.

Find me a formal proof of how “images are made” and I’ll show you one possible model of an infinite number of possible models to explain it with a few axiomatic correct twists to the math since all of our symbolic logic is a leaky abstraction that fails to capture how anything is “fundamentally made”.

Pretentious semantic wank is all you’re shipping


Check out invoke.ai for an example of something much closer to a professional tool.


nah, that has absolutely nothing to do with what they're saying. I've used it for over a year and this is a weird way for it to appear in conversation, I hope you're not astroturfing




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: