4o still exhibits the "pink elephant effect", it's just... subtler, and tends to...

danielbln · 2025-04-08T12:31:55 1744115515

But it's literally a different architecture (auto-regressive, presumably sequence based vs diffusion). In my experiments it is significantly, overwhelmingly better at consistency, coherence and prompt adherence. Things I needed control nets before it just... does it. and even zooming into fine details, they make sense.

Here is an example with a bunch of negations: https://i.imgur.com/P8G5ICs.png

orbital-decay · 2025-04-08T13:24:27 1744118667

Of course, it's a tiny specialized model vs a big generalist model. They're absolutely incomparable in size/quality though, especially of the text part. How much of this happens because of the poor encoder and worse training in other models, and how much due to the architectural differences? I'm not saying it's not better than the existing image gen models somehow, but it's pretty hard to separate the two since both are present. All current SotA LLMs including 4o itself have negation inaccuracy in text (you need a really complex prompt, or a long one with thousands of tokens, not a toy one), and I don't see why this one should behave differently in similar conditions. Especially considering that it also suffers from pretty much the same artifacts as other image models, just much less (fingers, extra limbs, perspective/lighting issues, overfitting, struggles with out-of-distribution generation etc.)

wat10000 · 2025-04-08T13:07:11 1744117631

It’s interesting that out of all the aquatic animals it could have used, it chose one that perhaps looks the most like an elephant.

danielbln · 2025-04-08T13:11:54 1744117914

That's just luck of the draw, in a new session with the same prompt it outputted a bird: https://i.imgur.com/SLT8cYe.png

wat10000 · 2025-04-08T13:58:25 1744120705

I'm not convinced. I tried it and it showed me a swimming hippopotamus, which is even more elephant-like than the turtle. I tried again and it gave me a pelican, which is not generally very elephantish, but this particular one has a gray body with a texture that looks a lot like elephant skin.