Not really. They're training directly on the waveform, so the model can learn in...

ycombinatorMan · on Sept 9, 2016

theres 0 chance of effective intonation and tone without understanding of the material

syllogism · on Sept 9, 2016

I think your use of the term "understanding" is very unhelpful here. It's better to think about what you need to condition on to predict correctly.

In fact most intonation decisions are pretty local, within a sentence or two. The most important thing are given/new contrasts, i.e. the information structure. This is largely determined by the syntax, which we're doing pretty well at predicting, and which latent representations in a neural network can be expected to capture adequately.

espadrine · on Sept 9, 2016

The same sentence can have a very nonlocal difference in intonation.

Say, “They went in the shed”. You won't pronounce it in a neutral voice if it was explained in the previous chapter that a serial killer is in it.

On the other hand, if the shed contains a shovel that is quickly needed to dig out a treasure, which is the subject of the novel since page 1, you will imply urgency.

Cybiote · on Sept 9, 2016

With enough labor, you could annotate enough sentences to cover a lot of dialogue cases. Sections like "'stop!', he said angrily/dryly/mockingly are probably fairly common. You'd be modeling the next most probable inflection given previous words and selected tones.

What would require understanding would be novel arrangements and metaphor to indicate emotional state. On the fly variations to avoid mononticity might also be difficult, as well as sarcasm or combinations/levels (e.g. she spoke matter of factly but with mirth lightly woven through).

Houshalter · on Sept 9, 2016

And who says it can't understand the material? There have been recurrent networks trained that can translate between languages, or predict the next word in a sentence, at remarkable accuracy. Combined with wavenet this could be quite effective.

thomasahle · on Sept 9, 2016

There could be cases where the intonation is dependent on things entirely outside of the book. If say a politician does something in the writing that is far from what we would expect them to do in today's world.

visarga · on Sept 9, 2016

How about we allow annotation of text with prosody cues? Mark the words you want stressed. We already use question and exclamation marks.

atty79 · on Sept 10, 2016

I'd love that. Writing is a poor representation of language. It'd be nice to bring it up a notch. Here's a suggestion in a paper I wrote on better second language acquisition. https://www.researchgate.net/publication/261022308_BETTER_SE...

spiritus_ · on Sept 9, 2016

Like traditional audio books can capture perfectly what you're referring to...

ycombinatorMan · on Sept 9, 2016

They can, though?