I'd be curious what the false positive rate on that is. Can you clone anyone's by collecting a set of ten voices with unique timbre reading the required statement plus pitch control to get close enough? A hundred? Or can you trick the neural net by giving it something that sounds like white noise to humans until the NN triggers in the right way and goes "ok yep that's a match, you're authorised now"?
Probably not something we'll get to hear as part of the PR pitch.
Or is the consent statement the thing that will be cloned and is there no separate training audio? Then it might actually work and you'll just have to get close enough that the human you're trying to fool can't distinguish anymore (defeating the need for this tech in the first place, at least in targeted rather than automated cases).
Yeah, good point - don't know. When I tried I actually did get a (personal?) email saying that it didn't match closely enough. After uploading another sample (based on a different text) it went through.
I like your idea of just training on the consent text! That wasn't the case when I tried it as you needed around 3h (optimally) of training data.
With a couple soundalike voices and changing the pitch in Audacity? That's a far, far cry from doing cutting edge neural networks that clone voices with samples of less than half a minute.
If you mean the white noise, I meant that as a brute force attack because, to do it more targeted (to know what it'll accept as seeming like your target voice), you'd likely need their exact model rather than doing your own.
Probably not something we'll get to hear as part of the PR pitch.
Or is the consent statement the thing that will be cloned and is there no separate training audio? Then it might actually work and you'll just have to get close enough that the human you're trying to fool can't distinguish anymore (defeating the need for this tech in the first place, at least in targeted rather than automated cases).