What do you want to use it on? Ollama works on anything: Windows, Linux, Mac and...

What do you want to use it on?

Ollama works on anything: Windows, Linux, Mac and Nvidia or AMD. I don't know if other cards like Arc are supported by anything yet, bit of it supports the open Vulkan API (like AMD) then it should work.

Every inference server out there supports running from CPU, but realize that it's much slower than running on a GPU - that's why this revolution didn't begin until GPUs became powerful and affordable.

As far as being clear to setup, Ollama is trivial: it's a single command line that only asks what model you want and they provide you with a list on their website. They even have a Docker container if you don't want to worry about installing any dependencies. I don't know what could be easier than that.

Most other tools like LM Studio or Jan are just a fancy UI running llama.cpp as their server and using HuggingFace to download the models. They don't even offer anything beyond simple inference, such as RAG or agents.

I've yet to see anything more than a simple RAG that's available to use out of the box for local use. The only full service tools are online services like Microsoft Copilot or ChatGPT. Anyone else who wants to do that more advanced kind of system ends up writing their own code. It's not hard if you know Python - there are lots of libraries available like HuggingFace, LangChain, and Llama-Index, as well as millions of tutorials (every blog has one).

Maybe that's a sign that there's room for an open source platform for this kind of thing, but given that it's a young field and everyone is rushing to become the next big online service or toolkit, there might not be as much interest from developers to build an open source version of a high quality online service.