I'm quite impressed by the video quality generated by tools like HeyGen, Synthesia where a "real" spokesperson will read through custom script in an extremely natural way.
I can't find out what kind of techniques are used, can anyone share more details or point me to research papers to learn more?