Considering speech-to-text models are fairly sophisticated and reliable nowadays...

Considering speech-to-text models are fairly sophisticated and reliable nowadays, it's likely that the primary input of LLMs will be audio. We do ultimately want AI interfaces to be conversational. Text input will surely be smarter as well, but I doubt we will converge on using keywords and shortcuts.