Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Let's talk about this "Narrative" example.

When I listened to it, my first impression was that it must be the real actor they included for comparison purposes but that they failed to label it correctly. I thought it is not machine-generated. I couldn't tell the slightest artifact except what sounded like a low-bitrate sound encoding (maybe using a codec geared toward speech). Can you tell anything "off" about it?

As for the encoding artifact such as a tinny sound or low-bitrate sound, that is the type you hear on an MP3 or low bitrate codec for speech. For example, when I record a message on https://vocaroo.com/ the "premier" voice recording service it sounds 10x worse. Here is a sample I just recorded of my own speech: https://voca.ro/18oSJ1sHU5w5

After my first impression that the narrative example might be a real human mislabelled for comparison purposes, I listened to the next two, labelled News and Conversational. I found these very easy to tell as AI-generated.

Thinking back to why I found the narrative example so compelling, I thought perhaps the issue is that the first example is in British English which I'm less used to than American English. I grew up in the United States. Perhaps since the accent doesn't match my own, it is harder for me to perceive it as generated.

-> Can a native speaker of British English tell us whether listening to the first example you can tell in any way that it is a robot? Maybe it is as obvious to you as the next two are to me.

Still, I've listened to a fair amount of British English in my life so perhaps there is an alternative explanation for why the first one was better. For example, it could have been trained on a reader's voice who has narrated thousands of hours in very high studio quality in a fairly consistent way, leaving this type of text much easier to synthesize than the other two examples due to more training data or higher-quality audio.

For me, the first one is really indistinguishable from a narrator's true voice, though it does sound a bit tinny which could also happen as an artifact of the recording process.

In terms of "how confident are you that this is a real person" the second two examples I would put at 0 - it's totally obvious that it is not a real person, whereas the first one sounds like a 10 to me: obviously a real narrator. (With a bit of artifacting that sounds like an mp3.)

[1] The text is here https://www.nytimes.com/2001/11/19/books/chapters/the-lord-o...



Hey! ElevenLabs here, confirming that all 3 samples (including the Narrative one) were AI-generated! We'll be opening up our platform later this month and would love for you test it yourself!


congratulations!! I can't tell the first isn't human no matter how much I try. That is an amazing achievement.


I'm a native British English speaker and can confirm the first example is incredibly good. It would be very difficult/impossible for most people to tell that the voice is generated from that clip alone.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: