Maybe we should try to find a list of exactly what they are focussing on going forward instead of the slow drip of things they’re cutting back on (servo, MDN, DeepSpeech...)
It’s a sad sad day when you have an organisation getting hundreds of millions in funding and turning away from what’s its good at. The decline has begun in my eyes, it may not become apparent for a few years yet.
Cutting out DeepSpeech seems sensible to me, it’s out of place in the general portfolio of products.
It would be nice if Mozilla could tell us what their focus is going to be, but I doubt that Mozilla management know at this point.
At this point I’m somewhat concerned that Firefox will be irrelevant in fives years, and I don’t currently feel that Mozilla is communicating clearly that they still care about Firefox. I assume they must, but it would be comforting to know that Firefox is still at the core of Mozillas strategy.
> Cutting out DeepSpeech seems sensible to me, it’s out of place in the general portfolio of products.
I disagree precisely because of the point you make later: "I’m somewhat concerned that Firefox will be irrelevant in fives years".
Functionality provided by deep learning is going to be an important component of many types of software interactions going forward. The logistics of this will be quite different from what we are used to in open source, with the need to fund and coordinate compute, collect and handle data being a more vital aspect compared to the past.
There are STT software, some mentioned in this thread, that match or are even better than DeepSpeech but none of them are as ergonomic. Accounting for the value of time, this means it will be more cost effective to outsource such capabilities to the cloud. Which comes with trade-offs that are difficult to appreciate in the short term: https://news.ycombinator.com/item?id=24236489
I'd say DeepSpeech fits in the mold of Mozilla as a company providing solutions to complicated software problems that are better at respecting the user and their privacy.
In the old days, the most accurate TTS and STT models were built into the OS. These days, you need to call into the cloud to get the best stuff. In [1], Internet Archive complains about the quality of their OCR software. It's not that OCR is so bad, it's that the best OCR is found on Google's and Microsoft's computers. It's possible to cobble something together using open source solutions like EasyOCR, Tesseract+OpenCV but that will only get you part of the way there. What makes the cloud offerings so good is they have enough resources to devote to pre-processing pipelines and architecture tweaks and settings better able to handle edge cases. Most of the mass resides in edge cases.
From my vantage, the future looks to be one of software as thin layers built atop APIs which call into programs running on the servers of a handful of companies. You might not think this a big deal but these software will be the ones scanning the environment, writing the emails, completing the thoughts and planning the calendars for the majority of humans.
Based on the testing I just did with Vosk, Mozilla DeepSpeech, Google Speech to Text and Microsoft Azure, I disagree with your arugment that SaaS has the best quality results.
Mozilla DeepSpeech was definitely trailing the bleeding edge, but Vosk using the vosk-model-en-us-daanzu-20200328 model produces very accurate results even on uncommon words, similar in performance to Google & Microsoft (which has generally better formatting than Google's STT)
Had Mozilla provided 4x to 8x more GPU resources and more staff, then their STT would likely be competitive. Other small STT developers can iterate and test much faster due to having more hardware at their disposal.
Even Google is trying to offload as much of these computations to on-device chips as possible nowadays though.
Their new Pixel has voice control entirely backed by on-device models for example.
I think SaaS is a stopgap for good ML, and that eventually enough of this will be open source, that basic tasks such as vision and speech will be cheap to solve for any company with high tech competency.
Is now a good time for someone to write the "Unbundling Mozilla" start-up post on substack? I'd love to see something cogent written up about it. Something like this[0]?
I'm not sure about Mozilla's efforts in STT, but they were lagging pretty far in TTS. [1]
Google/Baidu, universities, and an assortment of Chinese/Japanese/Korean social media companies (Line, etc.) are posting the most compelling TTS research, models, and code. Mozilla's TTS system [2] is an amalgam of some of these models, but it lags pretty far behind state of the art.
Mozilla should focus on getting additional revenue streams. We can help them out by trying to get Congress / DOJ to strip Google of its ability to have and maintain a browser with which they entrench their search and advertising moat. I think they're clearly in antitrust/anticompetitive territory.
[1] I'm pretty familiar with this field as I wrote https://vo.codes and https://trumped.com TTS systems. Neither of those are state of the art in terms of mean opinion score (MOS), but they're incredibly efficient.
Said 8 GPU server was consistently in use for Mozilla DeepSpeech (now renamed Mozilla STT) in training models. Its impressive how far Mozilla got considering how limited their resources were.
This is an area that I find unbelievably frustrating. A lack of computing resources in the current day is kind of insane. You can buy an 8GB GPU for <$1000. Even with the rest of the costs, the cost of hardware like this is a drop in the bucket when your main office is housed in Mountain View! Especially on a project that ends up being public-facing, these are missed opportunities where a little can go a long way.
I take your point but according to the release details on the repo it was not 8Gb on one card but a server with 8 cards, each a Quadro RTX 6000 with 24Gb, and they're around £4k each currently, so the cost of the GPUs alone is £32k
Ah, I see-- not an 8GB, 1-GPU server, but an 8-GPU server. That does make a bit of a difference, changing the cost from a new workstation to functionally a piece of capital equipment. Still, I'm not sure that my point about equipment costs falls short--even at (call it) $40K, you're probably talking less than 3 months of the company's all-in cost for the developer themself, amortized over multiple years.
Chromium is open source and you can apply policies to do the things you mention. Based on your logic Mozilla should also be forced to get rid of Firefox Sync.
Chrome is shoved down grandma's throat. She probably doesn't know much other than it's the "Google Internet thing". It's the default on Android and Google.com nags you to install it.
This is worrying given that Google cripples the browser and web standards to favor its own search engine and advertising platform.
Killed the semantic web and semantic markup? Check.
Disabled APIs for blocking ads? Check.
Use Google.com as the default search? Yep.
Embrace and extend the web with AMP and instant apps? Bingo.
Auto log into your Google session or nag until users permit it? Absolutely.
Trying to destroy the notion of a URL? I thought those were cool.
Google is destroying the web and is about as anti-competitive as they come.
I see a lot of what appears to be over reaction... doesn’t sound like deepspeech is ending in the first part of the announcement
“ Most of the technical changes were already landed, and we see no reason not to ship it. We’ll be releasing 1.0 soon and encourage everyone to update their applications”
So looks like at least 1.0 is near and still gonna happen... I know these seem like dark times for Mozilla but I believe they will survive. As I recall the decline of Netscape was a pretty dark time and out of that came Phoenix - er Firefox and here we are today... I’m sure Mozilla and many of the great projects will survive
I don’t know what is going to save Mozilla, really I don’t. I just wish there was a way to “reach” them and discuss how we the internet community could come to an agreement about what they could do to derive value we would pay for.
It’s not for a lack of trying on their part for sure, but it feels like just using their browser isn’t all there is to it any more
what they could do to derive value we would pay for
For someone that found Linux in the 90's and watched the birth of Mozilla from the ashes of Netscape, that's a very strange thing to read.
This site is not Slashdot, I know. It always had another kind of relation to business and money. But still...
I have no idea why Mozilla should need a business model. Much less I understand why should we think of one and agree on it.
How much money does it take to maintain a web browser? If it's a lot, maybe, just maybe, we should agree on a reduced feature set and refuse to use something more complex. Some people here talk about text mode browsers. I'm not so radical. Just keep it simple enough to be maintanable by a dozen of volunteers.
Almost all company-sponsored programming languages are run as loss leaders to enable selling some other profitable product of the company. What is the profitable product that Rust enables?
Well an IDE would’ve been one option, as well as backend services for enterprise who are migrating to Rust. Otherwise as I mentioned the product is services like outsourced development, consultancy and training resources?
People used to build a lot of software around Gecko, there are still some notable users like Komodo IDE, but Firefox is a lot harder to embed than it once was. Servo from the Rust team was supposed to solve this by providing a new embeddable browser core, not sure if that is still the long term plan
Firefox apparently is not longer a focus because it is hard to monetize outside of the search box, see earlier letter. I would definitely not take Firefox' future for granted at this point.
Firefox is the only thing Mozilla has ever been able to make any money with; anything else has gotten them a pittance at best.
Giving up on that because it's 'too hard', without first proving they have an alternative? That would be insanely foolish. They may as well close up shop now if that's their plan.
It'd be really awesome if they could develop a search engine or phone (I know they tried) that had an open standards / web-compatible development kit.
I want an anti-Google / anti-Apple. Something we own and can extend. Something that doesn't sell our data.
I'd also like to see Mozilla doing lobbying. Partnering with the EFF. We've strayed so far from the bright and open Internet of the 90's and 00's. It's depressing to think about how locked up and proprietary it's all become.
I'll buy Mozilla / Firefox merch. I'll pay a subscription.
edit: Talk to Shuttleworth. Fold Ubuntu in. I'll buy a Mozilla phone and a Mozilla laptop.
I feel bad for doing a "me too" comment, but you've nailed exactly my thoughts on the subject. I feel like Mozilla hasn't really tried something like this. Every time it gets suggested, it quickly gets shot down (by other internet commenters) as "can't be done" and "wouldn't generate nearly enough money".
Mozilla can model itself after Microsoft somewhat.
Provide a development stack (they're experts at Web and Rust). Make themselves the go-to shop for developers in that realm.
Sell them on an OS and editor with support. Partner with Ubuntu. Hell, I would even reach out to Nadella and see if they'd be willing to work with Mozilla on hedging against Google. Mac is becoming locked down and kind of unpleasant to develop on/for. Mozilla could win this.
Block all the advertising and tracking. Build a Spotify-like news aggregation service you can access from your Mozilla subscription.
Build an email service like Hey and a file backup service like Dropbox. It's too bad Zoom bought Keybase, but perhaps Chris Coyne wants a new gig?
We should team up to beat FAAMG. Most of the FAAMG actors are actually quite damaging to open source despite benefiting from it greatly.
This all sounds to me like capital intensive businesses against entrenched players where even the not so average consumer would likely not do more than pay lip service to it unless there was some secret sauce to this that was more compelling to the options
They neeed a good out of the park product in those markets to make any real headway. Too idealistic.
My only thought on this is that they should pivot to be like algolia , focus on Firefox being a reference implementation browser and seek their expertise to the other vendors, maybe. It’s one of the few verticals I can think of that would work strategically Without them having to pivot into things they have no experience with
Do they? I mean, most of these vendors are already competing, and unlike Firefox, they're not necessarily competing for the average Joe, but technical users who often have different priorities.
Those are also services that groups are used to paying for already, which means if they could eat the start-up costs, even at a reduced scale, they could make a profit at even a slight premium for things that they already do very well, and go from there.
I'm already personally paying Mozilla $8/mo for their VPN and private browser extension.
If they offered something like the services offered by mailbox.org, or Librem One? I'd switch my GMail account tomorrow, including the storage fees I'm paying on it, and would do it at triple the cost for not abusing my data. Hell, they already have the domain experience with their proximity to Thunderbird devs.
Does anyone know of other open-source projects in the speech-to-text space? DeepSpeech was one of the most promising projects, especially the latest versions...
Comparing DeepSpeech v0.7.4 to Vosk using plain spoken English samples from male and female speakers, they seem to be performing the same if I use vosk-model-small-en-us-0.3 and the full size DeepSpeech model.
When I use vosk-model-en-us-daanzu-20200328 the result is perfect on many of these tests, though it does not do punctuation or capitalization outside apostrophes. IIRC there is another project on Github that can add basic formatting though.
I am quite surprised with vosk's performance, it even handles odd words like Puget Sound well! Need to test our more accented audio on it, but this is quite exciting.
There are a lot of open source projects in this space. DeepSpeech is actually one of the outsiders (they are not represented well in the academic community), and also not quite competitive to other software (at least last time I checked).
E.g. some very active projects are:
* Kaldi (https://github.com/kaldi-asr/kaldi/) obviously, probably the most famous one, and most mature one. For standard hybrid NN-HMM models and also all their more recent lattice-free MMI (LF-MMI) models / training procedure. This is also heavily used in industry (not just research).
* ESPnet (https://github.com/espnet/espnet), for all kind of end-to-end models, like CTC, attention-based encoder-decoder (including Transformer), and transducer models.
* Google Lingvo (https://github.com/tensorflow/lingvo). This is the open source release of Googles internal ASR system, and used by Google in production (their internal version of it, which is not too much different).
* (RETURNN (https://github.com/rwth-i6/returnn) and RASR (https://github.com/rwth-i6/rasr), our own, although this is currently free for academic use only. It is used in production as well. Supports hybrid NN-HMM, CTC, end-to-end attention-based encoder-decoder, transducer, etc.)
And there are much more.
You will also find lots of ready-to-use trained models.
You seem to know a lot about the topic, any idea about the current state of text-to-speech? Haven't seen any opensource projects that would make, for example, an ebook enjoyable.
Recent more or less reasonable one is https://github.com/TensorSpeech/TensorFlowTTS, it implements all the latest algorithms. For simple business books it will be ok, for emotional fiction probably not there yet.
Extant TTS is already there for fiction, if you approach it with the right expectations (more an alternative to visual reading than dramatically read audio books.) I've 'read' numerous fiction books using MacOS's TTS ('Alex') and with my kindle (3rd gen 'keyboard' model from 2010.)
These extant solutions require an effort-investment from the user to work up to fast speeds, but once the user becomes acclimatized they work great. The neuroplasticity of the human brain seems to do a great job of smoothing out the wrinkles.
I agree - I've been using google's TTS api for audiobooks and it's great. I switch off between professional audio books (overdrive is amazing and free by public libraries) and TTS and, while professionals can add something, you get used to TTS pretty fast. Google's TTS gives 1 million free characters a month, which is pretty generous for a single person and it sounds pretty good. I read books with pretty weird character names (like the Wandering Inn web serial) and it never explodes. Sometimes it spells out character names but even for very non-standard names, it does fine.
I've experimented with some of tacotron TTS/espnet to do the TTS on my computer and they work alright. Sometimes you get weird edge cases and it makes some pretty weird sounds (and even if your laptop doesn't have a GPU, google co-lab works well for quick audiobook generation). I don't hit the million characters that often so it hasn't been a big deal but I'll probably move to home-made just because I like tweaking it.
The way I think about it is that the written word doesn't have much intonation anyway so as long as the audiobook doesn't offend me, it's a pretty good solution (and helps prevent eye strain after working on a computer all day)
At the point of them taking in input to process, audio that comes from a microphone or comes from a file is basically just a series of numbers and is the same. So there's no barrier in terms of feasibility.
Whether they're all set up to do that "off the shelf" is a different matter but it should be fairly straightforward to add this to any that lack it and because they're open-source anyone could do a bit of Googling etc and find suitable code to adapt to do it. I know DeepSpeech definitely can take audio from files directly as input as I've used it that way before, and I strongly expect many (or possibly all) of the others could too.
deepspeech.pytorch is a good one. Since Mozilla's DeepSpeech project is still using tensorflow 1.x, I think pytorch implementation is actually better.
https://github.com/SeanNaren/deepspeech.pytorch
That’s what got them in this mess in the first place, fifty pie in the sky projects to be relevant instead of focusing on Firefox or just saving revenue aggressively.
I work with Mozilla's DeepSpeech every day. Mozilla's STT is critical to the survival of important indigenous languages throughout the world.
I sincerely hope we can help make this project continue and that Mozilla can help us do that.
Ensuring indigenous languages have digital representation is essential to their survival. Speech recognition and synthesis are a vital part of that. Indigenous communities are often ignored by Big Tech because they bring little financial value to their bottom lines, but financial bottom lines are not everything. Culture is more important. Open source tools like DeepSpeech allow communities to build the tools they need for themselves.
Māori have been working to help build tools for te reo Māori, and our project is at the forefront of using open source tools like DeepSpeech to revitalize the Māori language. The core of a good speech recognition system helps us in many practical ways, such as improved transcription, support for pronunciation, correct announcements in public transport, correct information on maps and in many other ways. We may well continue to support and use DeepSpeech if the project can continue.
But there are also many other projects in other countries in the world who may follow on - such as the Kabyle people of Algeria who are using DeepSpeech, or the Mohawk nation in North America who have been looking into it.
By the way we are working on our web presence but for now this quick one pager gives some idea of the work we are doing - https://papareo.nz.
Yikes. All this because they refuse to trim the fat at the C-level. A company can't be profitable by only employing overhead. They'll all be forced to take the ultimate pay cut when Mozilla closes up shop.
In the repo/docs, it suggests that DeepSpeech is an option for some languages (English & German). Haven't tried it, but with recent(ish) performance improvements in DS it can run on somewhat less powerful computers than used to be the case.
Are there other foundations like mozilla we can donate to? For initiatives that are in the interest of the public? The Apache foundation is all I can think of but they focus on corporate use projects.
Submitted title was "Mozilla to put DeepSpeech project on hold". We've replaced that with the article title per this guideline: "Please use the original title, unless it is misleading or linkbait; don't editorialize." https://news.ycombinator.com/newsguidelines.html
> Until a proper decision is being made regarding the future of the project, we will “keep the lights on” and try to address existing issues and review your contributions to the best accommodation we can in the scope of our new roles.
You could say that "keep the lights on" is the same as on hold.
Mozilla let politics take over its corporation to the point where it's basically a far left extremist group now that's relying on semi-bribe funding from Google.
It’s a sad sad day when you have an organisation getting hundreds of millions in funding and turning away from what’s its good at. The decline has begun in my eyes, it may not become apparent for a few years yet.