It's not really welcome news, he is just saying they're putting it on the long finger because they think other stuff is more important. He's the same guy that kept ignoring the KV cache quant merge.
And the actual patch is tiny..
I think it's about time for a bleeding-edge fork of ollama. These guys are too static and that is not what AI development is all about.
He specifically says that they're reworking the Ollama server implementation in order to better support other kinds of models, and that such work has priority and is going to be a roadblock for this patch. This is not even news to those who were following the project, and it seems reasonable in many ways - users will want Vulkan to work across the board if it's made available at all, not for it to be limited to the kinds of models that exist today.
And the actual patch is tiny..
I think it's about time for a bleeding-edge fork of ollama. These guys are too static and that is not what AI development is all about.