I run tabby [0] which uses llama.cpp under the hood and they ship a vscode extension [1]. Going above 1.3b, I find the latency too distracting (but the highest end gpu I have nearby is some 16gb rtx quadro card that's a couple years old, and usually I'm running a consumer 8gb card instead).
[0] https://tabby.tabbyml.com/
[1] https://marketplace.visualstudio.com/items?itemName=TabbyML....