Yes, they do. There's evidence to suggest that punctuation marks were devised as pronunciation guides, indicating how to inflect and when to pause, rather than syntactic markers in their own right. Commas in particular indicate a distinctive inflection and short pause in speaking; such would be detectable by Alexa especially if it uses a neural net or similar to analyze human speech.
That was not the point of what I said at all. It Could interpret the vocal cues, yes. As far as I know this is still not a solved problem for speech to text, and results going the other way and trying to guess punctuation is still more reliable.
Back to what I really was getting at: I'm pretty sure the person I replied to was suggesting Alexa could just split(',') and call it a day. With text, yes. With voices this would be irritatingly unreliable. Everyone talks differently and sometimes people stumble weirdly. I am certain humans use a mix of vocal cues and interpretation to place the commas in their heads.
- ignore the comma, point and space as delimiter and compare the values / entities against a dictionary for neighbouring words.
- don't ignore the comma and compare the values against a dictionary.
Put a priority ( or in machine learning terms: a classifier) on both outcomes, because comma is not reliable in spoken language. So that it would interprit peanuts butter as [peanuts, butter] and peanut butter as [peanut butter].
PS. Now i hope that text-to-speech translates spoken: peanuts correctly to [peanuts] and not [peanut], because that would fail.
PS2. The article itselve doesn't mention the punctuation problem
>PS2. The article itselve doesn't mention the punctuation problem
It doesn't go into detail but it does seem to mention it.
>Off-the-shelf broad parsers are intended to detect coordination structures, but they are often trained on written text with correct punctuation. Automatic speech recognition (ASR) outputs, by contrast, often lack punctuation, and spoken language has different syntactic patterns than written language.
>Understanding pauses/inflection changes doesn't have to be a "solved problem" to work for cases such as discerning common shopping list style items.
Okay... But Alexa isn't just shopping lists. You only know you are dealing with a shopping list after parsing the text.
Even if you did go back, is the narrower use case any more solved than the general one? Guessing with text alone turns out to be fairly accurate and so even if you could do this decently, it would have to be notably better to be worth the trouble.
>That's an argument against discerning "milk" from "silk" or "coke" from "cork", but that's still managed satisfactorily enough.
Irrelevant to this though, considering that problem has mostly been solved at this juncture.