Oh, well, it turns out that keyboard sounds leak enough entropy to make it easy to attack even very strong passwords.
Microphones on devices such as Ring doorbell cameras are explicitly exfiltrating audio data out of your control whenever they're activated. Features like Alexa and Siri require, in some sense, 24/7 microphone activation, although normally that data isn't transmitted off-device except on explicit (vocal) user request. But that control is imposed by non-user-auditable device firmware that can be remotely updated at any time.
Finally, for a variety of reasons, it's becoming increasingly common to have a microphone active and transmitting data intentionally, often to public contexts like livestreaming video.
With the proliferation of such potentially vulnerable microphones in our daily lives, we should not rely too heavily on the secrecy of short strings that can easily leak through the audio channel.
Using a password manager is an easy and useful protection against audio leaks of passwords.
But this is an example of the kind of thing the OP is talking about. You're probably not at a very realistic risk of having your password hacked via audio exfiltrated from the Ring camera at your front door. Unless it's Mossad et al who want your password.
Like "you're probably not at a very realistic risk of having your phone wiretapped", this is overindexing on past experience—remember that until Room 641A commenced operations in 02003 (https://en.wikipedia.org/wiki/Room_641A), you weren't, and after it did, your phone was virtually guaranteed to be wiretapped. Similarly, you aren't at a very realistic risk of having your password hacked via audio, until someone is doing this to 80% of the people in the world. As far as we know, this hasn't happened yet, but it certainly will.
But again, that’s the Mossad scenario - NSA in this case. You’re essentially reinforcing the OP point. There are three threat models given in Figure 1 of the OP doc, and what you’re saying really only applies to the third.
Microphones on devices such as Ring doorbell cameras are explicitly exfiltrating audio data out of your control whenever they're activated. Features like Alexa and Siri require, in some sense, 24/7 microphone activation, although normally that data isn't transmitted off-device except on explicit (vocal) user request. But that control is imposed by non-user-auditable device firmware that can be remotely updated at any time.
Finally, for a variety of reasons, it's becoming increasingly common to have a microphone active and transmitting data intentionally, often to public contexts like livestreaming video.
With the proliferation of such potentially vulnerable microphones in our daily lives, we should not rely too heavily on the secrecy of short strings that can easily leak through the audio channel.