One caveat here is that whisper.cpp does not offer any CUDA support at all, acce...

kkielhofner · on Aug 23, 2023

ctranslate2 is amazing, I don’t know why it doesn’t get more attention.

We use it for our Willow Inference Server which has an API that can be used directly like OP project and supports all Whisper models, TTS, etc:

https://github.com/toverainc/willow-inference-server

The benchmarks are pretty incredible (largely thanks to ctranslate2).

stavros · on Aug 23, 2023

Obligatory hooking up of Willow to ChatGPT, for the best virtual assistant currently available:

https://twitter.com/Stavros/status/1693204822042739124

rebeccaskinner · on Aug 23, 2023

I haven’t used faster-whisper so I can’t compare performance, but whisper.cpp does support cuda via CUBLAS, and it’s noticeably faster than the cpu version. I used it earlier this year to generate subtitles for 6 seasons of an old tv show I backed up from dvd that didn’t include subtitles on the disc.

inciampati · on Aug 23, 2023

Thanks for the Nvidia based implementation!

Fwiw decent acceleration works on any avx2 compatible chipset. I get realtime speed for everything but the large models with a recent Ryzen system. The apple silicon is good but not as special as folks think!