Yeah, my conspiracy theory is Nvidia is somehow influencing the decision. If you can do Vulkan with Ollama, it opens up people to using Intel/AMD/other iGPUs and you might not be incentivized to buy an Nvidia GPU.
ROCm support is not wonderful. It's certainly worse for an end user to deal with than Vulkan, which usually 'just works'.
I agree. AMD should just go all in on vulkan I think, The ROCm compatibility list is terrible compared to...every modern device and probably some ancient gpus that can be made to work with vulkan as well.
Considering they created mantle, you would think it would be the obvious move too.
Vulkan is Mantle. Vulkan was developed out of the original Mantle API that AMD brought to Khronos. What do you mean "AMD should just go all in on Vulkan"? They've been "all in" on Vulkan from the beginning because they were one of the lead authors of the API.
iGPUs (and NPUs) are not very useful for LLM inference, they only help somewhat in the prompt pre-processing phase. The CPU has worse bulk compute but far better access to system memory bandwidth, so it wins in token generation where that's the main factor.
My conspiracy theory is that it would help if contributors kept the Vulkan Compute proposed support up to date with new Ollama versions; no maintainer wants to deal with out-of-date pull req's.
ROCm support is not wonderful. It's certainly worse for an end user to deal with than Vulkan, which usually 'just works'.