There are multiple open-source GPTs, but GPT-3 is absolutely massive - larger th...

joshuahedlund · on Oct 25, 2022

I guess my intuition is based on the file size of text being so much smaller than images, but I guess that doesn't really map to the complexity of generating it. Fascinating!

davnn · on Oct 25, 2022

I think large language models are still in their infancy. The models are extremely sparse, but we don‘t have the tooling yet to deal with these kinds of structures efficiently. Your intuition might be right in a future, maybe.

drusepth · on Oct 25, 2022

If you think about the space both models are covering from a rate-of-failure perspective, it kind of makes sense that images end up being a bit easier than text: both text- and image-models can output results that look plausible at first glance, but when you analyze both outputs further there are a lot more gotchas in parsing meaning within language than there are in pixel placement within an image.

behnamoh · on Oct 26, 2022

Actually, I'd argue that images generated by SD have much more flaws than texts produced by GPT-3. GPT-3 (at least, the full model) is quite capable of writing stuff that "make sense", but most eye candy results generated by SD are cherry picked, the others are simply crap.

macrolime · on Oct 26, 2022

That's kinda the point here I think. GPT-3 is trained on much more data than SD and contains much more knowledge. SD is actually similar in size to GPT-2.

Image models of the same size as GPT-3 should be much more impressive, the difference will probably be quite large like the difference between GPT-2 and GPT-3.

Ask SD to write a step by step guide to do something and it will create an image that looks kinda like some instructions, but the contents will be nonsense.

An image model of the size of GPT-3 could probably do this task quite well in many cases.

Image models needs much better language understanding to get to the next level also though, so probably multimodal models may make more sense. Maybe feeding web pages rendered as images to an image model could give interesting results.