I think generating useful embeddings off of a lot of realtime data flows (eg. user clickstream data) is in fact fairly difficult. Furthermore, if you had such embeddings it's unclear if an LLM would add value to whatever inference you're trying to do. If the LLM is not only be used for inference but to actually retrieve data ("find and summarize the clickstream history of his user") then I would not expect this to be doable in realtime.