More

synesthesiam · on Nov 23, 2022

This has always been a struggle. Rhasspy can gather lists of songs, artists, etc. but it will have to guess many of their pronunciations. And it seems artist/band names often purposely thwart conventional pronunciation rules :P

synesthesiam · on Nov 23, 2022

Sorry about that. I'm going to be working more closely with native German speakers to get it right!

synesthesiam · on Nov 22, 2022

This was the reason I designed voice2json [1] :)

[1] https://voice2json.org/

jrm4 · on Nov 23, 2022

Yup, I remember seeing this as well. I'm trying to determine the major differences between Rhasspy and this?

lukifer · on Nov 23, 2022

Rhasspy is a more powerful general-purpose application and GUI; voice2json is more like a library or micro-service that does exactly one thing: convert a speech waveform to JSON. They share some DNA though (same syntax for defining vocabulary).

I used voice2json to build a voice-controlled car audio player, it works amazingly well: https://github.com/lukifer/voicetunes

synesthesiam · on Nov 23, 2022

They share a lot of the same pieces, but voice2json is meant to work in Unix-style pipelines. Rhasspy has MQTT/HTTP/Websocket APIs instead.

synesthesiam · on Nov 22, 2022

Rhasspy author here, thanks for posting! Just wanted to mention that I've joined Nabu Casa (creators of Home Assistant) this month, so Rhasspy will be receiving updates again and be a major part of Home Assistant's "Year of Voice" in 2023 :)

cmsimike · on Nov 22, 2022

Thank you for your work! I was in a panic when Snips was bought up. After some research I landed on Rhasspy as my new local-first digital assistant, and it's been fantastic. Been using it for a few years now with satellites around the house with the 'brain' running on a VM. Even have a Siri shortcut which transcripts my speech input then makes an HTTP request to 'brain' instance so that I can use Rhasspy even if not around a satellite instance. This even works over my VPN!

synesthesiam · on Nov 22, 2022

You're welcome! What sort of hardware did you settle on for the satellites?

synesthesiam · on Oct 8, 2022

Is there a good guide for writing Rust-like C++, e.g. using stuff in the recent standards to avoid the many footguns?

synesthesiam · on June 30, 2022

The voices are under a CC-BY-SA license, so you can generate all the audio you want (offline), even for commercial usage.

synesthesiam · on June 30, 2022

Hi all, author here. Besides the tech of Mimic 3 itself, I'm interested in training voices in as many (human) languages as possible. All it takes is one person willing to donate a dataset for everyone to benefit!

...well, that and a bunch of stuff with phonemes. But I'll do that part :)

dEnigma · on June 30, 2022

Can't you use the Mozilla Common Voice dataset for that?

krisgesling · on June 30, 2022

The Mozilla Common Voice dataset is awesome - however it's useful the opposite purpose - speech-to-text. This is because it is a lot of different people using a range of hardware, speaking similar phrases.

For good text-to-speech you need 1 person speaking different phrases but very consistently. Here's an example dataset from Thorsten a German open voice enthusiast: https://openslr.org/95/

dEnigma · on July 4, 2022

Thanks for the explanation!

rjzzleep · on June 30, 2022

What does it take to add Chinese and Japanese to this? Surely it's a lot more than just training sets right? I have an android phone without access to google tts, so this might actually potentially be a nice alternative.

josephg · on June 30, 2022

How can people contribute? I'd be happy to sit in front of a microphone for awhile if I could use my own voice in a TTS engine!

sampo · on June 30, 2022

They want you to make good quality audio recordings of you speaking about 20 000 phrases. It could take 40 to 80 hours of speaking and recording, maximum 4 hours per day.

https://github.com/MycroftAI/mimic-recording-studio

https://mycroft.ai/contribute/

synesthesiam · on June 30, 2022

The amount of data depends on if there's a voice for the language already. If so, about 2 hours of data is usually good enough. Otherwise, 10-20 hours usually does it.

wilsonjholmes · on June 30, 2022

Where could I donate my voice?

worthless-trash · on June 30, 2022

What kind of workload are we looking at, do you care for the Australian accent?

krisgesling · on June 30, 2022

Bloody oath we do!

krisgesling · on June 30, 2022

Translation: "Yes"

... Hi from Darwin :D

synesthesiam · on June 30, 2022

I believe I fixed the bean bug ;)

synesthesiam · on June 30, 2022

Python is only really the glue here. The models are trained in PyTorch and exported to Microsoft's Onnx runtime (C++). So the bulk of the inference CPU cycles are outside Python.

synesthesiam · on June 5, 2022

Author of Mimic 3 here. Thanks! Hoping to release the first version this month :)

leobg · on June 5, 2022

Could these be used inside an iOS app like Voice Dream Reader? So far, they offer Voice from Acapela and Neospeech as In-App-Purchase.