The Ghibli trend completely missed the real breakthrough — and it’s this. The ability to closely follow text, understand the input image, and maintain context of what’s already there is a massive leap in image generation. While Midjourney delivered visually stunning results, I constantly struggled to get anything specific out of it, making it pretty much useless for actual workflows.
4o is the first image generation model that feels genuinely useful not just for pretty things. It can produce comics, app designs, UI mockups, storyboards, marketing assets, and so on. I saw someone make a multi-panel comic with it with consistent characters. Obviously, it's not perfect. But just getting there 90% is a game changer.
I had chatgpt generate a flow chart with mermaid js for something at work and then write a scott mccloud style comic book explaining it in detail and it looked so convincing, even though it got some of the details a bit wrong. It's _so close_ to making completely usable graphics out of the box.
4o is the first image generation model that feels genuinely useful not just for pretty things. It can produce comics, app designs, UI mockups, storyboards, marketing assets, and so on. I saw someone make a multi-panel comic with it with consistent characters. Obviously, it's not perfect. But just getting there 90% is a game changer.