Fundamentally the UI is up to you, I have a "typing-pauses-inference-and-starts-gaslighting" feature in my homebrew frontend, but in OpenWebUI/Sillytavern you can just pause it and edit the chain of thought and then have it continue from the edit.
That's a great idea. In your frontend, do you write in the same text entry field as the bot? I use oobabooga/text-generation-webui and I findit's a little awkward to edit the bot responses.
Thanks, for what it's worth unless you particularly need to use exl2 ollama works great for local inference and you can prompt together a half decent chat UI for yourself in a matter of minutes these days which gives you full control over everything.
I also lean a lot on https://www.npmjs.com/package/amallo which is a api wrapper i wrote for ollama which makes this sort of hacking very very easy. (not that the default lib is bad, i just didn't like the ergonomics)