You seem to be confused. As you said yourself, steganographic concealment would, by its very nature, not change the perceptual hash of the visible image. If the visible image doesn't match an known hash, the steganographically modified version isn't going to either.
First you generate an innocuous image that has a bad hash collision. (This is easy because perceptual hash are not cryptographically secure). Then in a second step you hide some offensive content in it via steganography without changing the hash. Then you send the image to the target.
He stores it in his cloud, it gets flagged because of the hash collision, so it get a manual review. The manual review take the image through some forensic software, which will catch the steganography (because the attacker will have chosen a weak scheme) which will reveal the hidden offensive content and then report you.
The manual review process only involves a severely transformed (low resolution, greyscale) version of the image which is attached to the safety token. The ability to decrypt any original files only occurs if the human review process confirms the presence of CSAM.
I don't have a lot of info on the quality of the visual derivative.
But since a human should look at it should have enough details to distinguish subtle cases like the age of the people in the picture, otherwise it's even more concerning.
If some human has enough info to make this call then the low-res greyscale visual derivative should still raise some flags if it get through a forensic software, as steganography software usually offer some resistance against usual compression artifacts.
I don't know exactly what's in the safety token, but we do know that it's grayscale and low resolution.
Allow me to be hypothetical for a moment; let's assume for a moment that the image has all chroma data stripped, it's downsampled to 1 megapixel, and then compressed to around 100 kilobytes using JPEG or HEIC. That would be sufficient for performing careful human review but would completely demolish any steganography.