As others have said, there's the pluralization of "peanut[s]" to distinguish between the two. This is a useful feature of English: the adjective-like role of a noun in a complex noun phrase is (almost?) always singular.
- Computer engineer
- NOT computers* engineer
- Toothbrush
- NOT teethbrush*
- Foot doctor
- NOT feet* doctor
- Alarm clock
- NOT alarms* clock, even when it supports multiple alarms!
Additionally, there's phrasal intonation. If the intonation and stress decrease throughout the phrase, it's a single item. If the intonation and stress reset for "butter," then it's a new item.
"Attorney generals" is a noun phrase (admittedly of questionable adjectivity).
"Attorneys general" is a blind idiot translation of a phrase in a language with different grammatical rules (Latin, IIRC).
"Attorney" is a noun. "General" as used here is an adjective.
It's unusual in that the adjective follows the noun without a hyphen, but it's common enough, and it's where prepositional phrases are seen, like "Big man on campus" and "powers that be".
No one ever gets a single "peanut". So unless you mush mouth the "S", the reasonable expectation for both your cohabitator and the robot is to bring peanuts and butter.
A better question is "coconut, milk" versus "coconut milk".
"Coconut" is still an anomalous grocery item. You'd want one of
- a coconut
- [number] coconuts
- shredded coconut
"Coconut" is best matched to that last option, but it's not a natural word choice. (Although it is a natural list entry... do people think of themselves as dictating to Alexa, or as writing the list themselves while happening to use their voice?)
Yes, I agree. If you're writing a list for yourself, a bare "coconut" is a typical entry. But if you're dictating a shopping list to someone else, you're quite unlikely to say "coconut" because that isn't grammatical.
So it turns into a question of how people think about dictating to Alexa.
This is a good point - in reality Alexa doesn't really have to do a great job transcribing at all if it's just constructing a list as a reminder for you later.
If this is a precursor to being able to quickly voice order stuff off amazon to be delivered though it's a different story.
This is a very interesting observation. The whole point of speech to text models being biased towards the US in terms of training data and innovation is valid not only across the larger things (gender/race/religion) but just small things like this. And these are likely to cause daily problems.
If it correctly understands peanutS, it will classify it as "more likely 2 items" considering it would check everything against some sort of dictionary. Which contains "peanuts, butter, peanut butter".
PS. I implemented something similar without machine learning and that's how i did it. With text it's easier though, i suppose in NLU it could have a parameter for "pause time between words" which could also contribute to a different conclusion.