Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Part of this is that the art generators tend to use CLIP, whjch is not a particularly good text model, often only being slightly better than a bag of words, which makes many interactions and relationships pretty difficult to represent. Some of the newer ones have better frontends which improve this situation, though.

I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch, and from a new random seed each time (and even if the seed is fixed, the initial stages of the generation, where things like the rough image composition form, tend to be quite chaotic and so sensitive to small changes in prompt). There are tools that can make far more controlled adjustments of an image, but they tend to be a bit less user-friendly.



> I think color is fairly well abstracted, but most image generators are not good for edits, because the generator more or less starts from scratch

It’s unlikely that the models have been trained on “similarity”. Ask it to swap red boots for brown boots and it will happily generate an entirely different image because it was never trained on the concept of images being similar.

That doesn’t mean it’s impossible to train an LLM on the concept of similarity.


I just asked Midjourney to do precisely that, and it swapped the boots with no issue, although it didn't seem to quite understand what it meant for a cat to _wear_ boots.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: