> Cutting out DeepSpeech seems sensible to me, it’s out of place in the general ...

posguy · on Aug 22, 2020

Based on the testing I just did with Vosk, Mozilla DeepSpeech, Google Speech to Text and Microsoft Azure, I disagree with your arugment that SaaS has the best quality results.

Mozilla DeepSpeech was definitely trailing the bleeding edge, but Vosk using the vosk-model-en-us-daanzu-20200328 model produces very accurate results even on uncommon words, similar in performance to Google & Microsoft (which has generally better formatting than Google's STT)

Try it yourself:

Google: https://cloud.google.com/speech-to-text/ See "Put Speech-to-Text into action" header

Microsoft: https://azure.microsoft.com/en-us/services/cognitive-service... See "Upload File"

Vosk: https://alphacephei.com/vosk/

Had Mozilla provided 4x to 8x more GPU resources and more staff, then their STT would likely be competitive. Other small STT developers can iterate and test much faster due to having more hardware at their disposal.

NegatioN · on Aug 23, 2020

Even Google is trying to offload as much of these computations to on-device chips as possible nowadays though.

Their new Pixel has voice control entirely backed by on-device models for example.

I think SaaS is a stopgap for good ML, and that eventually enough of this will be open source, that basic tasks such as vision and speech will be cheap to solve for any company with high tech competency.