The latency on this (or lack thereof) is the best I've seen, would love to know more about how it's achieved. I asked the bot and it claimed you're using Google's speech recognition, which I know supports streaming, but this result seems much lower lag than I remember Google's stuff being capable of
It's not entirely unlikely that the llm is informed exactly what it's source data is, with the hope that it can potentially make corrections to transcription errors