pragmatic\*, placing stress is less a problem of word meaning than it is of spea...

pragmatic*, placing stress is less a problem of word meaning than it is of speaker adaptation for listener comprehension, emphasis, and prosodic tendencies.

Even then, I don't believe the issue is with stress. I believe that the voices sound robotic because they are using, and also admitting because it makes their results impressive in some sense, very few samples, "less than a minute" they claim. Triphones are usually what speech systems are trained on. The amount of triphones (3-phoneme-grams) to cover a language's phonemic inventory is huge (50 phonemes = 50! triphones, which could mean a few hours of audio, although many will not occur within the language given the phonotactics of the language).