Hacker News new | past | comments | ask | show | jobs | submit login

whisper is simply not designed for this, in many ways, and it's impressive engineering to try and overcome its limitations, but I can't help but feel that it is easier to just use an architecture that is designed for the problem.

I was impressed by Kaldi's models for streaming ASR: https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index... ; I suspect that the Nvidia/Suno Parakeet models will also be pretty good for streaming https://huggingface.co/nvidia/parakeet-ctc-0.6b




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: