Err, I deeply respect Amazon TTS team but this paper and synthesis is..... You p... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		nshm on Feb 14, 2024 \| parent \| context \| favorite \| on: BASE TTS: The largest text-to-speech model to-date Err, I deeply respect Amazon TTS team but this paper and synthesis is..... You publish the paper in 2024 and include YourTTS in your baselines to look better. Come on! There is XTTS2 around! Voice sounds robotic and plain. Most likely a lot of audiobooks in training data and less conversational speech. And dropping diffusion was not a great idea, voice is not crystal clear anymore, it is more like a telephony recording.

thorum on Feb 14, 2024 [–]

xtts2 is great, but it looks like this model is probably more consistent with its output and has a better grasp of meaning in long texts.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact