The Ollama stuff is the old llama.cpp stuff that constrains output tokens.
It's great, I've used it to get outputs from as small a model as 1B.
But it's a stark difference in quality from, say, Phi-4's native tool-calling.
If Gemma 3 is natively trained on tool-calling, i.e. y'all are benching on say, Berekley Function Calling leaderboard, that'd be great to know out here.
Tangentially, github.com/ochafik is a Googler who landed an excellent overhaul of llama.cpp's tool-calling, might be worth reaching out to (if you're not working with him already!)
It's great, I've used it to get outputs from as small a model as 1B.
But it's a stark difference in quality from, say, Phi-4's native tool-calling.
If Gemma 3 is natively trained on tool-calling, i.e. y'all are benching on say, Berekley Function Calling leaderboard, that'd be great to know out here.
Tangentially, github.com/ochafik is a Googler who landed an excellent overhaul of llama.cpp's tool-calling, might be worth reaching out to (if you're not working with him already!)