I'm not sure why 4 was chosen (maybe just because it's a power of two?) but a while ago a SD user found that RGB can be approximated fairly well with a simple linear combination of the latents: https://discuss.huggingface.co/t/decoding-latents-to-rgb-wit...