Models never « see » anything to begin with, it’s all matrices. And since we ditched convents, locality doesn’t even matter anymore (almost). RGB is just convenient for humans, nothing says it’s optimal for deep learning
Yeah, real humans see with a fourier transform in a highly optimized basis for projecting 3d down to the 2d retina. Not cold soulless math like those machines!
Humans see more than 8-bit RGB. Humans can see light polarization and stereo disparity, but more importantly we can interact with things we look at.
XYZ is the "most optimal" 3-channel colorspace but it's a simple transformation from RGB, so it doesn't matter - the model can learn it if it wants to.