Hacker News new | past | comments | ask | show | jobs | submit login

I like how Chat GPT 4 will stammer, stutter and pause. This would be even better with a little "uhm" right when the speaker finishes talking, or even a chat bot that interrupts you a little bit, predicting when you're finishing - even incorrectly.

like an engaged but not-most-polite person does




Knowing when to speak is actually a prediction task in itself. See eg https://arxiv.org/abs/2010.10874

Would be indeed great to get something like this integrated with whisper, LLM and TTS


Hard for me to imagine that this could be solved in text space. I think the prediction task needs to be done on the audio.


We thought about doing this in Whisper itself, since its already working in the audio space.


Yes, this is something we want to look into in more detail, really appreciate sharing the research.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: