I think the issue is “understanding” IIRR it’s a debate as to the difference bet...

lewhoo · 2024-10-06T09:35:19 1728207319

I actually agree with you. I was a bit sarcastic. If I understand correctly there isn't a fundamental difference when it comes to text output vs pixel data output in this context. If so then it suddenly sounds much more of a stretch (intuitively) to claim that somehow stable diffusion understands the real world (like people claim to be the case with language models).