With a couple soundalike voices and changing the pitch in Audacity? That's a far, far cry from doing cutting edge neural networks that clone voices with samples of less than half a minute.
If you mean the white noise, I meant that as a brute force attack because, to do it more targeted (to know what it'll accept as seeming like your target voice), you'd likely need their exact model rather than doing your own.