It's really nice but I don't understand why they keep pushing with the idea of t...

operator-name · 2025-01-27T19:55:41 1738007741

Accessability and training data.

Nvidia canvas existed before text to image models but it didn't gain as much popularity with the masses.

The other part is the training data - there are masses of (text description, image) pairs whilst if you want to do something more novel you may struggle to find a big enough dataset.

JFingleton · 2025-01-27T21:42:30 1738014150

Image/video generation could possibly be used to advance LLMs in quite a substantial way:

If the LLM during it's "thinking" phase encountered a scenario where it had to imagine a particular scene (let's say a pink elephant in a hotel lobby), then it could internally generate that image and use it to aid in world-simulation / understanding.

This is what happens in my head at least!

vunderba · 2025-01-27T19:32:43 1738006363

These things are not mutually exclusive.

All of this already exists in various forms: inpainting lets you make changes by masking over sections of a image, control nets let you guide the generation of an image through many different forms ranging from depth maps to posable figures, etc.

weird-eye-issue · 2025-01-28T15:21:35 1738077695

> no one in the real world who's working on real content authoring actually uses textual descriptions

As someone who owns an AI image SaaS making over 100k per month this made me chuckle

Denkverbot · 2025-01-28T15:54:00 1738079640

Dang, you are so cool and so smart!

weird-eye-issue · 2025-01-30T15:12:30 1738249950

Sorry, what do you mean? I didn't say that