The 1.3b model is amazing for real time code complete, it's fast enough to be a better intellisense.
Another model you should try is magicoder 6.7b ds (based on deepseek coder). After playing with it for a couple weeks, I think it gives slightly better results than the equivalent deepseek model.
I run tabby [0] which uses llama.cpp under the hood and they ship a vscode extension [1]. Going above 1.3b, I find the latency too distracting (but the highest end gpu I have nearby is some 16gb rtx quadro card that's a couple years old, and usually I'm running a consumer 8gb card instead).
Another model you should try is magicoder 6.7b ds (based on deepseek coder). After playing with it for a couple weeks, I think it gives slightly better results than the equivalent deepseek model.
Repo https://github.com/ise-uiuc/magicoder
Models https://huggingface.co/models?search=Magicoder-s-ds