> e.g. the audio-preview model when given instruction to speak "What is the capital of Italy" would often speak "Rome". This model should be much better in that regard
"Much better" doesn't sound like it can't happen at all though.