The Ollama stuff is the old llama.cpp stuff that constrains output tokens. It's ...

The Ollama stuff is the old llama.cpp stuff that constrains output tokens.

It's great, I've used it to get outputs from as small a model as 1B.

But it's a stark difference in quality from, say, Phi-4's native tool-calling.

If Gemma 3 is natively trained on tool-calling, i.e. y'all are benching on say, Berekley Function Calling leaderboard, that'd be great to know out here.

Tangentially, github.com/ochafik is a Googler who landed an excellent overhaul of llama.cpp's tool-calling, might be worth reaching out to (if you're not working with him already!)