Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
AI Assistants Have Poor Usability: Study of Alexa, Google Assistant, Siri (2018) (nngroup.com)
56 points by spideymans on Aug 30, 2020 | hide | past | favorite | 61 comments


This is a bit inevitable. The UI of your usual screen based app has buttons for the things you can do. This has the effect of limiting your expectations to those features which have been implemented. A general voice query is open ended, though: there's no guidance around what exists, so it's very easy to invent a query that's 'off book' in one way or another... Things which haven't been implemented at all, or which have, but are requested in a way the machine does not expect.


Indeed, an open-ended interface like a VUI invite unimplemented interactions.

I worked on a conversational agent and now realise that there are fundamental usability barriers that are very difficult to overcome:

- Limited information bandwidth. Adults can visually read between 250–600 words per minute, plus they have peripheral vision that can help in scanning for shapes, colors, and images. Plus information in a GUI is persistent on a screen for easy reference. For voice, adults can only comfortably listen at 150–160 WPM and would need to hold information in their memory for reference. This makes voice ecommerce impractical for anything beyond familiar essentials.

- Lack of editing layer. It’s simple and straightforward to correct text box input, but it’s difficult to correct a voice command and doing so is ambiguous and adds extra chaotic information. A lot of people think aloud, so they frequently self-correct themselves mid-sentence or add on information in an ad-hoc manner: “I’d like a cappuci—er, make it a latte. (pause) And oh with soy milk.”

- High context switching cost. GUI’s keep contexts within frames, tabs, and windows — and the user can easily (low cost) switch between them. This is an Amazon shopping cart context, this is a Reminders context, and so on. A button press is unambiguously contained within its context. In voice, contexts are not quite parallel but sequential and have to be built up over time.


We've had better technology in NLP and problem-solving since the mid-1970's -- and many methods that were deemed intractable are trivial now given that compute is on the order of 1 million times more powerful. Systems that use partial-order hierarchical planning, abductive logic programming, bi-directional search, constraint satisfaction, multi-modal interaction, multi-agent conversational modeling based on planning applicable speech-acts, etc., have been replaced with simple state-machines, and in "advanced" cases, simple slot-filling -- and more recently relatively shallow black-box stochastic methods based on deep learning models.

Having worked at two companies that offered such systems, I can say that this is due to both not understanding/appreciating what has been done in the past, but also because companies treat these systems like car companies treat features -- they know that an deliver rich features in the future, so the "logic" is to wait to do so, in the hope they can make more money in the long-term, and be able to dribble out new features.

To be fair, there is also the issue of reproducibility and predictability -- companies expect everybody's agent to respond the same way given the same input, rather than have variability in agents due to non-determinism or context and learning -- but we can't ask for for flexible "human-like" AI on one hand and also not expect the variation (and occasional misunderstandings) that humans would also suffer.


>Even though it goes against the basic premise of human-centered design, users have to train themselves to understand when an intelligent assistant will be useful and when it’s better to avoid using it.

I find this is one of the truest things about this article. It's very easy when you first get a new voice assistant to get carried away with the features it has available. Most are either very basic, struggling to take much additional context, or simply just gimmicks/novelties. The majority of tasks can be performed faster through a GUI when you account for the time it takes to brute force the right "natural language" expression to do the thing you need.


Well the idea is that you get hands free control of the device.

I'm laying in bed and I suddenly realize I need an alarm for tomorrow morning. I don't want to turn my phone on cause that can mess with my sleep.

I just tell siri or Google to set an alarm and sleep.

Or I'm in the middle of cooking and I want to listen to music but don't want to have to wash my hands first.

That's really where the benefit of the assistants lays.


These days, a useful command has been "Alexa, what is the air quality index?". It's a good way of knowing if I can open the window.


That's /r/aboringdystopia material right there.


Who genuinely uses Assistants from their respective phone and for what tasks in general,just wanted to know is someone really making most use of it?


I use them almost on a daily basis to control the lights in my apartment, it's surprisingly convenient and helps with multitasking when I'm in a hurry. The same thing goes for weather and reminders. The best use case when you're in the middle of something, eg. cooking and just want to make a simple note or set a timer.


Does it actually work for you though? I've quickly abandoned Google assistant because it just misunderstands what I say about 2/3s of the time. A friend of mine who has all his lights controllable by voice complains about the same issue - a lot of the time, Google just doesn't understand the command so by the time you get it "right" you could have walked up to the switch 3 times.


I had Alexa controlling the lights for a while, but I found talking requires much more cognitive effort than picking up a remote, finding the right button by touch, and pushing it.

I can literally do the latter three quarters asleep, but not so much the former.

It would be far more useful to have the process almost completely automated. Lights go on when someone enters a room and go off when everyone leaves, with optional manual override.

This turns out to be a hard(ish) problem that needs better sensors and/or some form of personal ID.


Using Alexa you can set up routines, which are effectively macros triggered by a keyword, and tend to be recognised more consistently. I’ve got mine set so “bedtime” turns off the main lights and turns on dimmed side lights, and “goodnight” turns off all the lights in the house.


I switched from g-assist to Alexa because at least in my experience I found that while google was way better at random trivia, Alexa was way better at understanding the narrow set of commands she supports.

I think this is both a mic quality issue and a nlp issue. All the online comparisons compare random trivia but I almost never ask trivia because the failure rate on both is too high (that domain is way too open ended for assistants right now I think).


It does, just need to remember that they require a specific way of talking. I'm always by default talking more loudly and articulately with simple expressions, then it gets it for the first try. (At least Google Assistant, Siri has way way more issues understanding me.)


Weather, news, music, and random questions while I'm not at my computer, or I don't have my hands on the keyboard.


[Disclaimer: I work for Siri; discount my enthusiasm accordingly]

For me, Siri when holding my phone is the least compelling use case. I kind of like using it for Alarm Clocks and Timers, because recognition in that domain is quite reliable, and one instruction saves multiple clicks.

Since touch navigation on a watch is less convenient than on a phone, some further use cases become more convenient with voice than with touch, e.g. asking "Is it going to rain today" before stepping out the door.

On Homepod, music is an obvious use case, which is unfortunately a rather difficult domain, because of the wide variety of media names. I use commands like "play some John Coltrane", "Shuffle play list Aggro", but also "who's playing piano on this track" (availability is a bit variable) or "what song is this". Home control is also convenient.

With Airpods, music is again the obvious use case, but I also like using them for walking directions (because you can walk without having to constantly glance on your phone). They also serve as a "poor man's CarPlay" in cars not equipped with a suitable media system (With the transparency mode on AirPod Pros, I feel that they are not an undue safety risk).

CarPlay is one of my favorite use cases, because I can navigate, listen to music, and listen and respond to messages without having to take my eyes off the road. When stuck in traffic, I also like asking what my ETA is.


Good to know. As you work for Siri(Apple) I want to learn and contribute in NLP fields as well.

1.Can you share your journey if possible?

2.Is Ph.D or masters necessary and how helpful the work related is in the company vs the research one does as part of Ph.D(because i have heard one doesn't get autonomy to researh one's own subjects under the supervisor but need to do what supervisor says)

3.Can you share any resource you lookup to learn and about Siri internal's working

4.How deadline works in research field ,currently working as a Front end developer deadline in my work as per ETA we can judge roughly,but how it works in research oriented field where one is unsure whether things are delivered as per requirement

5. Where do you see NLP future going?

Thanks


1. I have pretty much a pure programming background. No previous experience with speech or machine learning when I was hired (but that was long ago).

2. The software engineers in the team have a variety of backgrounds. I think I may be the only PhD, and my subject was not relevant to the job. Plenty of people with bachelors (for visa reasons, non-US employees tend to have higher degrees). Machine learning knowledge helps, but is not strictly required. For the data scientists, on the other hand, advanced degrees are a definite plus, and so is some specialization in a relevant subject.

3. The most important thing to know about Siri's internal working is that we don't talk about Siri's internal working…

If you randomly would like to learn something to improve your chances of working at Siri, a machine learning class (e.g. Ng's and/or Hinton's Coursera classes) could definitely help.

4. There is still a lot of engineering involved, and often by the time a formal schedule is worked out, the scientific discovery part has largely been solved. Sometimes features end up not working out and have to be pushed back. What helps for Siri is that a lot of the complex functionality is server side or in updatable assets, so iteration is a possibility.

5. That's above my pay grade, really. What I learned the past few years is never to bet against deep learning being able to tackle a particular problem, but I can't shake the feeling that we'll discover limits some day.


Thanks for answer :)


I use Siri all the time for sending/reading text/WhatsApp messages and making calls and setting/stoping alarms and timers, and creating notes and taking pictures (Siri will open the app to where you are a click away from taking the picture. I also use Siri as a dictionary and weather forecaster.

I wish Siri was more powerful, to do things like create contacts and turn of the phone but Apple seems super conservative about what it will allow Siri to do


Sending & reading messages in public?,because apart from home automation and minor tasks,telling and hearing private infos through assistant is not that comfortable i feel.


I've found Siri is absolutely useless for anything except the following:

> Hey Siri, set the lights to red.

> Hey Siri, set the lights to fifteen percent.

> Hey Siri, in Houston, what's the weather right now?

The weather question has to be worded absolutely correctly. If I instead ask:

> Hey Siri, what's the weather right now in Houston?

Then Siri will stop listening at "now" and immediately answer saying that she needs location services turned on.


I use mine all the time to send texts, make phone calls, set alarms and timers, and add calendar events.


I set alarms and reminders. I ask how to say thing in various languages, then ask those words in that language to see if Google say the right definition. I check facts.

I turn light on and off.


Fact retrieval performance varies a lot and is a good test of an assistant. Alexa is very good, because it can't rely on a screen (and it seems a lot of people work on making it good at knowledge type stuff). Google is also decent most of the time. Apple often defaults to "here's what I found" which is useless.


Siri, for the following:

* Check weather

* Open a certain app on my watch without touching it

* Start a phone call or facetime

* Play some music —> this starts a personal radio station in apple music

* Add a reminder to my grocery list or another specific list. Or “remind me tomorrow”

* start a timer on my watch

* Set an alarm on my watch

* turn on/off smart home lights

* Some stuff with Siri Shortcuts. I use this mostly for time tracking timers, starting and stopping. But sky’s the limit here, you can start any multi step automation from siri if you have a use case that doesn’t require input

* Show me pictures from the web of X

* Search basic facts


I’m a big fan of “.. remind me when I get home to ______.”


Bixby responds to almost anything, so it is useful to find my phone. It can answer a question about my calendar, am I available at 9 am tomorrow, but it useless for setting an appointment. ‘Create an appointment with Bill tomorrow at 9 am’, will create an all day appointment with a location 9 am.

Alexa doesn’t integrate with my calendar at all. (incompatible)


I use it for ad hoc hands free navigation while driving. It works 60% of the time and I don't have to intervene with my hands.

60% is sort of like a gamble with no downside. Do I have to stop the car or can I just set the destination while driving? I would have to stop the car anyway were it not for voice activated commands.


For me adding calendar events is faster through speech than using the GUI.

GUI: find Google Calendar icon, scroll the months to find the date, open date, scroll to find the time, open it, type the title.

Speech: "OK Google, add event on October 28 at 12pm, lunch with Grace".

Setting alarms is also very convenient.


I use my Google home for timers while cooking, turn on/off lamps that have awkward to reach switch's, Christmas tree lights when the time arrives. My biggest usability issue hey/ok Google is a mouthful and does not roll off the tongue like Siri/Alexa.


I use google assistant multiple times a day. play pandora or youtube playlists on various speakers throughout my house, set reminders, add things to my shopping list or todo list, check the weather, set alarms, navigation if I'm driving, random google queries


"Take me to..." as I'm buckling up in the car has been very helpful. Then I'll use it to send an "on my way" text.


Announce reminders of various routine tasks, and calendar meetings, throughout the day.


These solutions don't mean to be help/assist their users but to help their vendors to collect information about the users.


I disagree. It's not about collecting information - at least, not directly. Assistants don't seem to be designed to gather neither any substantial amounts of information, nor any unique information.

Due to their limited communication capabilities they're extremely poor at collecting meaningful data - there's simply no point in talking to them as their utility fails with anything but the simplest and most direct commands. There's not much data in "turn kitchen lights on", "set alarm at 10am", "play $album by $artist", "remind me to buy milk" and "convert 4.7 miles to kilometers" - at best, this exposes a few minor behavioral patterns that can be trivially determined even without a dedicated device to talk to.

And, hopefully, outright spying wasn't one of the design goals, even though they can be of dual use. Google, Apple and Amazon may be cooperative but they aren't NSA, after all.

My opinion, it's about use of particular services. Alexa will push you towards Amazon purchases, use of Amazon Music, etc. Similarly, Homepod requires investment into Apple ecosystem. And Google Assistant will push towards all things Google. In my opinion, voice assistants were made to be helpful within their particular ecosystems and thus add more ties to their vendors' services. (And, sure, those services collect data.)


Why does HN have so many conspiracy theorists?

They're clearly designed to make money by keeping you in the Amazon / Google / Apple ecosystem. I don't think any of them make much money from data collection because they don't collect that much data.

I have Google Homes and you can go and look at all the data it collects - for me it is 90% "hey Google play some music" or "hey Google what's the weather" or "hey Google set a timer for 10 minutes".

Useful for improving they speech recognition but not much else.


> Why does HN have so many conspiracy theorists?

Reflexive cynicism is easy, and makes you appear smart to a large number of commenters.

Of course, it doesn't help that often times, the automatic cynicism is entirely accurate, especially when contrasted against empty PR statements.


There is 1 more important piece of information: Your voiceprint.

With it, google can identify you by voice, put a mic in a public infopanel, and gues which commercial has most impact.

If the NSA has access, they can classify public recordings by people at that place, so I assume they are interested.

HN has so many conspiracy theorists because we know what is possible with infotech on an industrial scale, the spooks have the money for it, and snowden basically provided the proof they did it.


> put a mic in a public infopanel, and gues which commercial has most impact

There are infopanels (from smaller dodgy companies) with cameras which try to identify viewers behaviours. But actually expecting Google to both identify you and measure response is a bit out there for many reasons. 1. They actually do care about access to data they store - creating a fingerprint database for identification would be beyond what they do. 2. Who actually reacts vocally to an ad in a way that can be captured without environment noise - the assistant can't even set the timer correctly sometimes when I talk to it in perfect conditions. 3. If they did want to do super-shady targeting, your location data crossed with ad location + search/browser history is both better and cheaper than a voice database.


> 1. They actually do care about access to data they store - creating a fingerprint database for identification would be beyond what they do.

Would it? What if it was structured the way they handle other personal data: customers send in "voice telemetry" from their "info"panels, and Google gives them bulk analytics ("#x different people talked in your venue, with following age/demographic split"), as well as using that telemetry to handle ad attribution and ad targeting in the background. This way - just like with regular web telemetry - neither the advertiser not the "info"panel owner get to see the voiceprint data, but it still gets used to uniquely identify individuals and target ads at them.


Right I'm sure the NSA has hacked Alexa etc. or if they haven't they're working hard on it.

But that is very different from saying that Amazon and Google created them in order to gather data. Amazon was the first to make a voice assistant and their intent was pretty clearly to make money by getting people to buy it and to buy stuff from Amazon through it. To suggest that their true motivation was data collection is pure conspiracy theory.

Also voice identification is a relatively recent (and not terribly reliable) feature so it can't have motivated their creation.

Hell there's no way it's voice identification is good enough for the fantasy scenario you describe. It's only just about good enough to distinguish between 2-3 people.

Who does Google even need to identify my voice? They already know where I am from my phone.

This is what I mean about HN conspiracy theories. It's not just that people here know more about what is possible. None of the conspiracy theories make any sense.


Yeah but do you know the use case of such information? Nobody cares about what you tell alexa on a daily basis.

The NSA isn't interested in you period. They just want access to all data so they can hit a specific target.

There's technical capabilities and there's technical interest. Your and there analysis fails at the "interest" part.


I don't really understand why people don't seem to want tailored ads. All ads interupt attention, better that they have a chance if being useful.


In today's world, ads are no longer about telling me about solutions to my problems, but they are about creating new needs. I'm fine with my current needs and I don't want smart people pulling my psychological levers to spend money on things I don't already want.

As ads get more and more effective, people will be incentivized to put up more and more ads. But there are already way too many ads for my liking.


I am quite happy with buying things I didn't need. I work to buy luxury(and I say this as a nomad who lives out of a single luggage with 6 t-shirts). Most of what we buy today were not available a couple of centuries ago and yet quite a large population would swear that they are "needs".

There are too many ads but it is possible to avoid a great percentage of them if inclined. My interaction with ads is pretty minimal and when presented often enjoyable. Using uBlock Origin I don't see ads on most sites. Youtube premium means no ads there as well. Spotify premium, so no ads for music. Video streaming services like netflix don't have ads.

My only ad interaction left is reddit(which offers premium) and podcasts or youtube video in-content ads where they are narrated by the creators whose content I am happy to support and in some cases I find the way they switch to ads quite funny.


Spot on - and I don't think this is new. Go back to newspaper ads in the 19th century and many (perhaps most) can be categorized as trying to create new needs. We're just better at it than we used to be.


It's because it violates privacy. For example if someone has a huge secret hobby of collecting dildos and suddenly on some unrelated site they see an ad for a huge dildo.. that will cause a bit of an alarm.

That's the extreme case, more than likely people are a bit unnerved when an unrelated site shows them something they looked up on amazon an hour ago.

Even so I suspect a lot of these complainers just have hidden internet hobbies they just want completely private.


I don't really want to be manipulated into buying things I would not otherwise want/need.


I gave it a chance.

The tailor failed utterly and the "bespoke" version was worse than the old version.

Add to this that they want to keep records on me and it just isn't worth it.


Can't they do both? I personally find Google Assistant very helpful. I'm sure Google finds my personal data helpful. Seems like a fairly reasonable trade-off for me


I've had the opposite experience: completely useless due to poor integrations. Example: "Send a slack to Mary". Fail. "Read my most recent text message", fail. Even Hangouts is badly integrated, if at all.

I think the problem is, as always, a combination of business goals.

First, I suspect they don't want to help apps they haven't made a cross marketing deal with, like Slack. They'll install the app but won't work with it. Eg, "Play Tool on Spotify" works fine.

Second, Google's internal novelty-chasing merit system doesn't reward a coherent customer experience. You'd think Hangouts (or whatever chat is called this week) would be seamless after 5 years, but it's like they never met.


They're pretty obviously designed to do both.


Correct; incentives are not aligned. Even for Apple.


This would potentially destroy any hands free benefit a virtual assistant would have, back in the day of the secretary (the roll these talking Pringles tubes are trying to emulate) the bossman CEO or whatever would press a button on the office intercom to talk with his secretary. I think virtual assistants would benefit from a simple push-to-talk style input so they don't misinterpret a pause as a stop.


What's the point of posting this analysis that is over 2 years old in a young and rapidly iterating space?


Have human voices changed that much in two years?


No But recognition and analysis has...


Half on-topic: Is there an API on Android that allows you to write your own assistant with your own keyword set without sending everything over the wire?

An app with microphone permissions would obviously work-ish, but is it possible to listen in the background like the assistants do?


My main criticism about Alexa at least boils down to bad usability. To me, it feels like learning a bunch of magic spells by heart. It feels so awkward. I am used to exact commands as a long-time CLI user. However, when it comes to speech, I somehow don't want to accept having to remember the other end of a pattern matcher. All the skills I have tried so far had a similar feel to it. I feel overwhelmed when I realize I have to remember how to launch the skill. I should probably give GA a try one day. But for now, I give the voice assistant thing 5 more years, hopefully it will be something I want to use then.


I took a class in college on voice interface design, and the professor named this as the biggest hurdle that's yet to be overcome. Screens and keyboards are the computer's domain, and we're comfortable adapting to them. But speech is our domain, and we expect the computer to adapt to us. The problem is that the tech isn't quite there for real understanding of language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: