IIRR it’s a debate as to the difference between 99% of the time
It predicts the next pixel will be fleshy and the pixel next to it is background this making something that looks fingery (and so when presented with
An odd angle that 99% drops crazily” or that somehow there is a executive function that has evolved that gets a concept of finger with movement, musculature etc
It’s the “somehow evolved” part that is where I have my concerns.
Predictive ability based on billions images, sounds good. Executive function - how does that work? But at some point we are playing “what is consciousness” games.
Would love to hear more rigourous thought than mine - any links gratefully received:-)
I actually agree with you. I was a bit sarcastic. If I understand correctly there isn't a fundamental difference when it comes to text output vs pixel data output in this context. If so then it suddenly sounds much more of a stretch (intuitively) to claim that somehow stable diffusion understands the real world (like people claim to be the case with language models).
IIRR it’s a debate as to the difference between 99% of the time It predicts the next pixel will be fleshy and the pixel next to it is background this making something that looks fingery (and so when presented with An odd angle that 99% drops crazily” or that somehow there is a executive function that has evolved that gets a concept of finger with movement, musculature etc
It’s the “somehow evolved” part that is where I have my concerns.
Predictive ability based on billions images, sounds good. Executive function - how does that work? But at some point we are playing “what is consciousness” games.
Would love to hear more rigourous thought than mine - any links gratefully received:-)