Hacker News new | past | comments | ask | show | jobs | submit login

One caveat here is that whisper.cpp does not offer any CUDA support at all, acceleration is only available for Apple Silicon.

If you have Nvidia hardware the ctranslate2 based faster-whisper is very very fast: https://github.com/guillaumekln/faster-whisper




ctranslate2 is amazing, I don’t know why it doesn’t get more attention.

We use it for our Willow Inference Server which has an API that can be used directly like OP project and supports all Whisper models, TTS, etc:

https://github.com/toverainc/willow-inference-server

The benchmarks are pretty incredible (largely thanks to ctranslate2).


Obligatory hooking up of Willow to ChatGPT, for the best virtual assistant currently available:

https://twitter.com/Stavros/status/1693204822042739124


I haven’t used faster-whisper so I can’t compare performance, but whisper.cpp does support cuda via CUBLAS, and it’s noticeably faster than the cpu version. I used it earlier this year to generate subtitles for 6 seasons of an old tv show I backed up from dvd that didn’t include subtitles on the disc.


Thanks for the Nvidia based implementation!

Fwiw decent acceleration works on any avx2 compatible chipset. I get realtime speed for everything but the large models with a recent Ryzen system. The apple silicon is good but not as special as folks think!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: