Right after Ebert mentioned Alex, I stopped reading, selected the text of the article, went to OS X's Services menu, and listened to Alex read the rest of it.
It reminded me how far consumer voice synthesis has yet to come, but it did give me a better appreciation of some of the more subtle things the Alex voice has in terms of intonation. Despite still sounding obviously synthetic, it's obviously doing quite a bit of analysis on the sentence structure to vary the pitch in a natural way.
But that makes me wonder: Why, when complex things like structural intonation are already in consumer TTS products, do (deceptively) simple things like consonant sounds and pacing still sound so stilted?
It reminded me how far consumer voice synthesis has yet to come, but it did give me a better appreciation of some of the more subtle things the Alex voice has in terms of intonation. Despite still sounding obviously synthetic, it's obviously doing quite a bit of analysis on the sentence structure to vary the pitch in a natural way.
But that makes me wonder: Why, when complex things like structural intonation are already in consumer TTS products, do (deceptively) simple things like consonant sounds and pacing still sound so stilted?