Hacker Newsnew | past | comments | ask | show | jobs | submit | more dsteinman's commentslogin

I tried using this one time but unfortunately found it didn't work when I wore my reading glasses. The screen reflection off my glasses just whites out both eyes.


With Joe Rogan moving to Spotify, including video streaming, I wonder if Spotify is seriously contemplating building out a true Youtube competitor.


Is Spotify offering video hosting as well as streaming? I ask because of mixcloud - I can _live_ stream video which gets broadcast, but only the audio currently gets stored for future playback.


Havent seen anything public, but it seems like they are building this, for when Joe Rogan launches.


I am attempting to bring 100% client-side speech recognition to the web:

https://github.com/jaxcore/bumblebee

Although the first release is not officially out yet, the NodeJS code is working and you can install the development version of the app server and try out the hello world app locally.

The solution involves running Mozilla DeepSpeech inside an Electron desktop application with a websocket server and client API that NodeJS scripts can interact with, to receive speech recognition results, utilize "alexa" style hotword commands, and text-to-speech. The electron app handles all the heavy stuff, and you just use a simple API.

A web browser extension can also make use of this API to bring these capabilities to web sites, but that part isn't finished yet.


It's not really "the web" if you have to use Electron and Node surely? Wouldn't it make more sense to do it with web workers and wasm?


The web browser extension would communicate with the electron app server, NodeJS would not be needed in that scenario (the electron app includes the nodejs server code). You can write your web voice app with static client-side JavaScript which communicates with the Electron server through the browser extension.

Web Page <-> Bumblebee JS API <-> Bumblebee Extension <-> Bumblebee Electron App (DeepSpeech)

DeepSpeech with the pretrained english model is enormous (1.4GB) it's not feasible to load it into a web worker. It can run in a server, but then every website would have to run its own server side speech recognition servers which is difficult and expensive to scale.


This is turning out to be a bad decision because there's been 5 major version bumps in the past year, yet the functionality in Electron hasn't materially changed very much, mostly bug fixes and minor changes.


Interested in why you think this was a bad decision. For a multitude of reasons surrounding security, performance and wanting the Latest And Greatest JS features we want to stay as close to upstream Chromium as possible. Curious what you feel the negative impact of major-versioning is?

For more info on our release cadence: https://www.electronjs.org/blog/12-week-cadence


The Electron version numbers are essentially meaningless now. I have no idea what even changed between Electron 4 and 8, the changelogs are all just bug fixes that didn't necessitate so many major version releases.

Also there are some NPM packages that have to create builds for specific versions of Electron, and those builds come out after Electron does, so I'm always 1 or 2 versions behind on Electron which leads into dependency hell situations.

Trying to stay up to date is exhausting.


With semantic versioning, you can tell the magnitude of the release (and if backwards compatibility is broken) by looking at the major version number.


Anyone recall how fast the G4 chips got phased out once Apple switched to Intel? It was pretty fast. Granted, they were a much smaller company at that time.


I've been using DeepSpeech to learn to build voice controls for all sorts of things in JavaScript. And I've got a way to connect it to the web so you'll be able write speech recognition enabled web pages using client side JavaScript.

https://github.com/jaxcore/deepspeech-plugin


Now that is an unfortunately named company.


It's arguably the perfect time for them to... go viral.



Even the Corona beer is selling fewer than usual.

It reminds me of Avid renaming a module they have that was called ISIS Client Manager (now called NEXIS) when the Islamic State thing was a major thing in the news. Maybe it was already planned, but the timing was suspicious.


They've been around for a while though.


My last order of assembled PCB's has been on hold since late December. I don't even know of an affordable alternative that isn't also in China.


While not a source for 'assembled', https://oshpark.com/ claims to manufacture just the PCB's in the USA for $5/sq inch in three board units.


I have been working on getting Mozilla's DeepSpeech and some additional JS libraries up to a level where it can be used (among other things) as a voice keyboard.

https://github.com/jaxcore/deepspeech-plugin

It's not quite there yet, but I'm working on it.

It can type numbers and symbols reasonably well, I need to do some additional work like build a custom language model to be able to type letters and plug some other gaps in Mozilla's CommonVoice model.

Here's the number typing example: https://github.com/jaxcore/deepspeech-plugin/tree/master/exa...


Any idea how it compares to Mozilla's DeepSpeech?


from my experience, when trained with the same data, kaldi is slightly better and with custom recipes adaptable to changing conditions. deepspeech has way better documentation and is more developer friendly. wav2letter seems to be the quickest. i guess there is no real winner here ... depends what criteria are applied.

just an example, kaldi is a weird mixture of c++, python2, python3, shell scripts, java, perl. hard to oversee. deepspeech is python. wav2letter is an exe file.


> deepspeech is python

I was under the impression DeepSpeech was native (C++), with bindings for Python and others. Personally I've used it with Node.js so far, and I couldn't see any dependencies on Python.

edit: I was talking client, you're talking training I guess.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: