Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there any chance that gpt-4o-transcribe might get confused and accidentally follow instructions in the audio stream instead of transcribing them?


Here's a partial answer to my own question: https://news.ycombinator.com/item?id=43427525

> e.g. the audio-preview model when given instruction to speak "What is the capital of Italy" would often speak "Rome". This model should be much better in that regard

"Much better" doesn't sound like it can't happen at all though.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: