Thanks, for what it's worth unless you particularly need to use exl2 ollama works great for local inference and you can prompt together a half decent chat UI for yourself in a matter of minutes these days which gives you full control over everything.
I also lean a lot on https://www.npmjs.com/package/amallo which is a api wrapper i wrote for ollama which makes this sort of hacking very very easy. (not that the default lib is bad, i just didn't like the ergonomics)