I’ve seen a few YouTube thumbnail generation examples on Reddit (I’m on vacation...

I’ve seen a few YouTube thumbnail generation examples on Reddit (I’m on vacation so not gonna search for a link) that show multimodal with inline text giving specific instructions. It’s impressed me in a way that I haven’t been with LLMs for 2 years, IE it’s not just getting better at what it already does, but a totally new and intuitive way of working with generative AI.

My understanding is it’s a meta-LLM approach, using multiple models and having them interact. I feel like it’s also evidence that OpenAI is not seriously pursuing AGI (just my opinion, I know there’s some on here who would aggressively disagree), but rather market use cases. It feels like an acceptance that any given model, at least now, has its own limitations but can get more useful in combination.