Hacker Newsnew | past | comments | ask | show | jobs | submit | more synesthesiam's commentslogin

This has always been a struggle. Rhasspy can gather lists of songs, artists, etc. but it will have to guess many of their pronunciations. And it seems artist/band names often purposely thwart conventional pronunciation rules :P


Sorry about that. I'm going to be working more closely with native German speakers to get it right!


This was the reason I designed voice2json [1] :)

[1] https://voice2json.org/


Yup, I remember seeing this as well. I'm trying to determine the major differences between Rhasspy and this?


Rhasspy is a more powerful general-purpose application and GUI; voice2json is more like a library or micro-service that does exactly one thing: convert a speech waveform to JSON. They share some DNA though (same syntax for defining vocabulary).

I used voice2json to build a voice-controlled car audio player, it works amazingly well: https://github.com/lukifer/voicetunes


They share a lot of the same pieces, but voice2json is meant to work in Unix-style pipelines. Rhasspy has MQTT/HTTP/Websocket APIs instead.


Rhasspy author here, thanks for posting! Just wanted to mention that I've joined Nabu Casa (creators of Home Assistant) this month, so Rhasspy will be receiving updates again and be a major part of Home Assistant's "Year of Voice" in 2023 :)


Thank you for your work! I was in a panic when Snips was bought up. After some research I landed on Rhasspy as my new local-first digital assistant, and it's been fantastic. Been using it for a few years now with satellites around the house with the 'brain' running on a VM. Even have a Siri shortcut which transcripts my speech input then makes an HTTP request to 'brain' instance so that I can use Rhasspy even if not around a satellite instance. This even works over my VPN!


You're welcome! What sort of hardware did you settle on for the satellites?


Is there a good guide for writing Rust-like C++, e.g. using stuff in the recent standards to avoid the many footguns?


The voices are under a CC-BY-SA license, so you can generate all the audio you want (offline), even for commercial usage.


Hi all, author here. Besides the tech of Mimic 3 itself, I'm interested in training voices in as many (human) languages as possible. All it takes is one person willing to donate a dataset for everyone to benefit!

...well, that and a bunch of stuff with phonemes. But I'll do that part :)


Can't you use the Mozilla Common Voice dataset for that?


The Mozilla Common Voice dataset is awesome - however it's useful the opposite purpose - speech-to-text. This is because it is a lot of different people using a range of hardware, speaking similar phrases.

For good text-to-speech you need 1 person speaking different phrases but very consistently. Here's an example dataset from Thorsten a German open voice enthusiast: https://openslr.org/95/


Thanks for the explanation!


What does it take to add Chinese and Japanese to this? Surely it's a lot more than just training sets right? I have an android phone without access to google tts, so this might actually potentially be a nice alternative.


How can people contribute? I'd be happy to sit in front of a microphone for awhile if I could use my own voice in a TTS engine!


They want you to make good quality audio recordings of you speaking about 20 000 phrases. It could take 40 to 80 hours of speaking and recording, maximum 4 hours per day.

https://github.com/MycroftAI/mimic-recording-studio

https://mycroft.ai/contribute/


The amount of data depends on if there's a voice for the language already. If so, about 2 hours of data is usually good enough. Otherwise, 10-20 hours usually does it.


Where could I donate my voice?


What kind of workload are we looking at, do you care for the Australian accent?


Bloody oath we do!


Translation: "Yes"

... Hi from Darwin :D


I believe I fixed the bean bug ;)


Python is only really the glue here. The models are trained in PyTorch and exported to Microsoft's Onnx runtime (C++). So the bulk of the inference CPU cycles are outside Python.


Author of Mimic 3 here. Thanks! Hoping to release the first version this month :)


Could these be used inside an iOS app like Voice Dream Reader? So far, they offer Voice from Acapela and Neospeech as In-App-Purchase.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: