When I listened to it, my first impression was that it must be the real actor they included for comparison purposes but that they failed to label it correctly. I thought it is not machine-generated. I couldn't tell the slightest artifact except what sounded like a low-bitrate sound encoding (maybe using a codec geared toward speech). Can you tell anything "off" about it?
As for the encoding artifact such as a tinny sound or low-bitrate sound, that is the type you hear on an MP3 or low bitrate codec for speech. For example, when I record a message on https://vocaroo.com/ the "premier" voice recording service it sounds 10x worse. Here is a sample I just recorded of my own speech: https://voca.ro/18oSJ1sHU5w5
After my first impression that the narrative example might be a real human mislabelled for comparison purposes, I listened to the next two, labelled News and Conversational. I found these very easy to tell as AI-generated.
Thinking back to why I found the narrative example so compelling, I thought perhaps the issue is that the first example is in British English which I'm less used to than American English. I grew up in the United States. Perhaps since the accent doesn't match my own, it is harder for me to perceive it as generated.
-> Can a native speaker of British English tell us whether listening to the first example you can tell in any way that it is a robot? Maybe it is as obvious to you as the next two are to me.
Still, I've listened to a fair amount of British English in my life so perhaps there is an alternative explanation for why the first one was better. For example, it could have been trained on a reader's voice who has narrated thousands of hours in very high studio quality in a fairly consistent way, leaving this type of text much easier to synthesize than the other two examples due to more training data or higher-quality audio.
For me, the first one is really indistinguishable from a narrator's true voice, though it does sound a bit tinny which could also happen as an artifact of the recording process.
In terms of "how confident are you that this is a real person" the second two examples I would put at 0 - it's totally obvious that it is not a real person, whereas the first one sounds like a 10 to me: obviously a real narrator. (With a bit of artifacting that sounds like an mp3.)
Hey! ElevenLabs here, confirming that all 3 samples (including the Narrative one) were AI-generated! We'll be opening up our platform later this month and would love for you test it yourself!
I'm a native British English speaker and can confirm the first example is incredibly good. It would be very difficult/impossible for most people to tell that the voice is generated from that clip alone.
When I listened to it, my first impression was that it must be the real actor they included for comparison purposes but that they failed to label it correctly. I thought it is not machine-generated. I couldn't tell the slightest artifact except what sounded like a low-bitrate sound encoding (maybe using a codec geared toward speech). Can you tell anything "off" about it?
As for the encoding artifact such as a tinny sound or low-bitrate sound, that is the type you hear on an MP3 or low bitrate codec for speech. For example, when I record a message on https://vocaroo.com/ the "premier" voice recording service it sounds 10x worse. Here is a sample I just recorded of my own speech: https://voca.ro/18oSJ1sHU5w5
After my first impression that the narrative example might be a real human mislabelled for comparison purposes, I listened to the next two, labelled News and Conversational. I found these very easy to tell as AI-generated.
Thinking back to why I found the narrative example so compelling, I thought perhaps the issue is that the first example is in British English which I'm less used to than American English. I grew up in the United States. Perhaps since the accent doesn't match my own, it is harder for me to perceive it as generated.
-> Can a native speaker of British English tell us whether listening to the first example you can tell in any way that it is a robot? Maybe it is as obvious to you as the next two are to me.
Still, I've listened to a fair amount of British English in my life so perhaps there is an alternative explanation for why the first one was better. For example, it could have been trained on a reader's voice who has narrated thousands of hours in very high studio quality in a fairly consistent way, leaving this type of text much easier to synthesize than the other two examples due to more training data or higher-quality audio.
For me, the first one is really indistinguishable from a narrator's true voice, though it does sound a bit tinny which could also happen as an artifact of the recording process.
In terms of "how confident are you that this is a real person" the second two examples I would put at 0 - it's totally obvious that it is not a real person, whereas the first one sounds like a 10 to me: obviously a real narrator. (With a bit of artifacting that sounds like an mp3.)
[1] The text is here https://www.nytimes.com/2001/11/19/books/chapters/the-lord-o...