I wonder if someone will develop a "The quick brown fox jumped over the lazy dog" for English pronunciation. Something you could read aloud that would cover all the sounds they needed to build something like this.
It'd be a cool graduate project... kinda wish I was into linguistics right now.
It would probably be several paragraphs long, at the shortest. Depending on accent and cultural upbringing, a person varies how they'd pronounce a phoneme depending on nearby sounds, words, or even sentences.
I am very annoyed by the current brute-force, heuristic approaches to human-sound acoustics. I wish the sounds were dynamically computerized by way of mechanical simulations of the anatomical parts involved in human speech articulation.
> I wish the sounds were dynamically computerized by way of mechanical simulations of the anatomical parts involved in human speech articulation.
Actually, I think that's what was attempted in the first place in the early 80's. I remember seeing TV shows and museum exhibits that demonstrated this approach. One I especially remember, and highly dates the efforts, had a vector imaging display (think of the original Tempest and Asteroids arcade games) project a silhouette of a tongue and vocal cavity to demonstrate how the current phoneme was generated to listeners.
Of course, back then, such simulacra were limited by lack of parallel processing power and inadequate understanding of biophysics. This lead to the brute force "sound sampling" approach nowadays as memory became more cheap and audio capture hardware was perfected. I do wonder if it's time to return to vocal anatomy modeling again, now that we have a better understanding of how to perform biometric and physics modeling via massive computational parallelism.
I imagine the reason why progress on this model has been slow is how very extremely challenging the task is. It would require a sturdy knowledge of linguistics, physics, computer programming, etc. The sampling model in contrast is a piece of cake.
The anatomical model does indeed sound very interesting. Each phoneme would be recognized as one particle, on which intonation and dynamics effects could be applied algorithmically; and much of advancement in this area would be employable by speech recognition models, probably increasing their accuracy by a considerable amount.
I really hope some serious contenders step up to the plate for this.
It'd be a cool graduate project... kinda wish I was into linguistics right now.