Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey - developers behind ElevenLabs here. Thank you so much for the constructive and positive feedback - we’re taking it onboard!

We’re currently focused on researching and deploying a different way for speech synthesis that can generate nuanced intonation and emotions by understanding text and taking context into account. Additionally, we provide creators with a way to clone their own voice based on very short samples. With the published blog post, we are now deploying a way to help them design entirely new ones!

Anyone will be able to generate that level of quality just with a copy-paste. We are planning to open up Beta later this month. Our goal is to let you convert any written content into high-quality, compelling audio.

To address a few questions that frequently came up:

- Latency for our streaming TTS is <1s with quality results available above, which is the usual problem with existing good TTS models (like tortoise-tts)

- We can clone voices instantly, based just on 5s of speech, without training required

- We are working on adding SSML-like support for better control; speed controls will be coming as part of that too

- API is directly available as part of Beta; we are preparing the infrastructure to scale easily for the release!

We are hiring researchers, frontend and full-stack developers! If you are interested, send over your GitHub account and short message to founders[at]elevenlabs.io.



Hey Piotr - just wanted to say congratz for the awesome work so far man. The quality is genuinely unbelievable. I don't know if you guys are ready to take clients at scale, but I don't see any reason why all newsletter creators wouldn't use your tech right now to address whole new markets. I'll be following the journey, excited for what's to come.


Maybe I'm late to the party -- but this [1] graphic is great in the linked article.

Could the designer share a little about how it was made? Does it represent one of the generated voices, or is it just 'artistic'? (both are cool, I think).

[1] https://blog.elevenlabs.io/content/images/2023/01/Sequence-0...


The voices are really amazing, I couldn't really tell that they are synthetic and I was looking for it.

The only issue is that the actual recordings sound like they have been overcompressed, or poorly recorded - is there any way to improve this? Something like superresolution, but for voice?


What is your business model? How are you deciding who gets Beta access? What does the voice generation interface look like?


We are offering both Speech Synthesis (/TTS) and Voice Lab (Rapid Voice Cloning and Voice Design) as a standard SaaS model (w/ fixed quota of characters you can voice per month). API is directly available on the platform. Outside of standard package that flips to usage-based model and we do tailored deals for custom needs and discounts for high-volume usage.

Currently testing Beta with a range of storytelling and publishing use-cases, tackle relevant feedback and make sure the infrastructure supports it. We are planning to open up Beta to everyone by end of this month.

Voice Design interface is currently set of sliders and toggles but currently iterating on what is most accessible.


Hi! Are your models english only, or do you plan on tackling other languages?


They will be multi-lang, the tech scales to any language and we are working to add more (it is relatively easy). Here is the demo in Polish TTS: https://www.youtube.com/watch?v=ra8xFG3keSs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: