I tried using this one time but unfortunately found it didn't work when I wore my reading glasses. The screen reflection off my glasses just whites out both eyes.
Is Spotify offering video hosting as well as streaming?
I ask because of mixcloud - I can _live_ stream video which gets broadcast, but only the audio currently gets stored for future playback.
Although the first release is not officially out yet, the NodeJS code is working and you can install the development version of the app server and try out the hello world app locally.
The solution involves running Mozilla DeepSpeech inside an Electron desktop application with a websocket server and client API that NodeJS scripts can interact with, to receive speech recognition results, utilize "alexa" style hotword commands, and text-to-speech. The electron app handles all the heavy stuff, and you just use a simple API.
A web browser extension can also make use of this API to bring these capabilities to web sites, but that part isn't finished yet.
The web browser extension would communicate with the electron app server, NodeJS would not be needed in that scenario (the electron app includes the nodejs server code). You can write your web voice app with static client-side JavaScript which communicates with the Electron server through the browser extension.
Web Page <-> Bumblebee JS API <-> Bumblebee Extension <-> Bumblebee Electron App (DeepSpeech)
DeepSpeech with the pretrained english model is enormous (1.4GB) it's not feasible to load it into a web worker. It can run in a server, but then every website would have to run its own server side speech recognition servers which is difficult and expensive to scale.
This is turning out to be a bad decision because there's been 5 major version bumps in the past year, yet the functionality in Electron hasn't materially changed very much, mostly bug fixes and minor changes.
Interested in why you think this was a bad decision. For a multitude of reasons surrounding security, performance and wanting the Latest And Greatest JS features we want to stay as close to upstream Chromium as possible. Curious what you feel the negative impact of major-versioning is?
The Electron version numbers are essentially meaningless now. I have no idea what even changed between Electron 4 and 8, the changelogs are all just bug fixes that didn't necessitate so many major version releases.
Also there are some NPM packages that have to create builds for specific versions of Electron, and those builds come out after Electron does, so I'm always 1 or 2 versions behind on Electron which leads into dependency hell situations.
Anyone recall how fast the G4 chips got phased out once Apple switched to Intel? It was pretty fast. Granted, they were a much smaller company at that time.
I've been using DeepSpeech to learn to build voice controls for all sorts of things in JavaScript. And I've got a way to connect it to the web so you'll be able write speech recognition enabled web pages using client side JavaScript.
It reminds me of Avid renaming a module they have that was called ISIS Client Manager (now called NEXIS) when the Islamic State thing was a major thing in the news. Maybe it was already planned, but the timing was suspicious.
I have been working on getting Mozilla's DeepSpeech and some additional JS libraries up to a level where it can be used (among other things) as a voice keyboard.
It can type numbers and symbols reasonably well, I need to do some additional work like build a custom language model to be able to type letters and plug some other gaps in Mozilla's CommonVoice model.
from my experience, when trained with the same data, kaldi is slightly better and with custom recipes adaptable to changing conditions.
deepspeech has way better documentation and is more developer friendly.
wav2letter seems to be the quickest.
i guess there is no real winner here ... depends what criteria are applied.
just an example, kaldi is a weird mixture of c++, python2, python3, shell scripts, java, perl. hard to oversee. deepspeech is python. wav2letter is an exe file.
I was under the impression DeepSpeech was native (C++), with bindings for Python and others. Personally I've used it with Node.js so far, and I couldn't see any dependencies on Python.
edit: I was talking client, you're talking training I guess.