Is there any chance that gpt-4o-transcribe might get confused and accidentally f...

simonw · 2025-03-20T19:46:36 1742499996

Here's a partial answer to my own question: https://news.ycombinator.com/item?id=43427525

> e.g. the audio-preview model when given instruction to speak "What is the capital of Italy" would often speak "Rome". This model should be much better in that regard

"Much better" doesn't sound like it can't happen at all though.