XTTS: Open-source Foundation TTS model

sneak · on Sept 15, 2023

> Working with Heather Meeker, world-leading expert on open source licenses, Coqui has created a new, innovative model license, the Coqui Public Model License (CPML), and XTTS will be the first ever model released under the CPML! You can read more about the Coqui Public Model License (CPML) here.

Followed the link. The CPML has restrictions that make it a proprietary license, decidedly not open source/free software.

Using the term "open source" in that paragraph is deceptive. Combined with the GitHub link, this leads me to believe this is just more open source cosplay.

nmfisher · on Sept 15, 2023

The code is open source (Mozilla Public Licence), it's only the licence for the model weights which prohibits commercial use.

I don't think the concept of open source translates very well to model weights. The "source" is effectively the model architecture (which is freely available) and the training scripts/data (I don't know if these are available or not). With access to those, you can reproduce the weights yourself.

IMO the situation is closer to "source is open, but if you want to use our published binaries commercially, pay us". It's not free/libre, but it's also not unreasonable. Coqui is a small company, and freely releasing the weights for commercial use would deprive them of one of the few revenue streams they have.

JoshTriplett · on Sept 15, 2023

> IMO the situation is closer to "source is open, but if you want to use our published binaries commercially, pay us".

In such a circumstance you can still compile the source yourself. In this case, you cannot.

Also:

> Coqui is also innovating in open source model licensing.

"innovating" by making something that isn't open and falsely calling it "open source". And even that isn't "innovating" because a few others are already engaging in the same false advertising.

nmfisher · on Sept 15, 2023

In this case, you can train it yourself. That’s why I think it’s a fair comparison.

JoshTriplett · on Sept 15, 2023

It's not at all obvious if they're providing sufficient training data to do so. "Supply your own training data" seems the equivalent of "supply your own source code".

3np · on Sept 15, 2023

Yeah, it feels akin to claiming your obfuscated binary libraries and apps are open source because you happened to release the in-house-built compiler used to make them under a free license.

zajio1am · on Sept 15, 2023

> I don't think the concept of open source translates very well to model weights.

I think it translates pretty well. Model weights are not really that different from other non-code assets, like images or 3D models. If i can bundle open source code with such assets and offer the whole application under open source license, then such assets are open source. If the license of such assets prohibits that, they are clearly not open source.

There is question about what is 'source code' for such assets and how to apply copyleft licenses like GPL to them, but non-copyleft open source licenses like MIT/X11 do not care about that and could be easily applied on model weights or any other assets.

osanseviero · on Sept 14, 2023

XTTS, the production-quality TTS model from Coqui, is released

- Multilingual: Generates speech in 13 different languages

- Voice cloning with 3 seconds of audio

- Cross language voice cloning as well

- 24khz quality

- Blog: https://coqui.ai/blog/tts/open_xtts

- Demo: https://huggingface.co/spaces/coqui/xtts

- Model: https://huggingface.co/coqui/XTTS-v1

totetsu · on Sept 15, 2023

This is great

https://coqui-prod-creator-app-synthesized-samples.s3.amazon...

antman · on Sept 15, 2023

Runtime error for the demo

mianos · on Sept 15, 2023

This whole thing is so easy to use. Pip install or docker installer, start little flask app and you are good to go.

The speech output of this tool is as good as I have ever heard. I am looking forward to sending this to my ESP32 remote audio player.

What a world we live in!

moeffju · on Sept 17, 2023

This is pretty great. Is anyone aware of a model that can apply this to timestamped text, such as in subtitles?

ilaksh · on Sept 16, 2023

The examples of that latest model sound very close to as good as Eleven Labs. Anyone compared them thoroughly?

olivierduval · on Sept 15, 2023

The french language is not as good as the english language seem to be