Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So many here are trashing on Ollama, saying it's "just" nice porcelain around llama.cpp and it's not doing anything complicated. Okay. Let's stipulate that.

So where's the non-sketchy, non-for-profit equivalent? Where's the nice frontend for llama.cpp that makes it trivial for anyone who wants to play around with local LLMs without having to know much about their internals? If Ollama isn't doing anything difficult, why isn't llama.cpp as easy to use?

Making local LLMs accessible to the masses is an essential job right now—it's important to normalize owning your data as much as it can be normalized. For all of its faults, Ollama does that, and it does it far better than any alternative. Maybe wait to trash it for being "just" a wrapper until someone actually creates a viable alternative.



I totally agree with this. I wanted to make it really easy for non-technical users with an app that hid all the complexities. I basically just wanted to embed the engine without making users open their terminal, let alone make them configure. I started with llama.cpp amd almost gave up on the idea before I stumbled upon Ollama, which made the app happen[1]

There are many flaws in Ollama but it makes many things much easier esp. if you don’t want to bother building and configuring. They do take a long time to merge any PRs though. One of my PRs has been waiting for 8 months and there was this another PR about KV cache quantization that took them 6 months to merge.

[1]: https://msty.app


> They do take a long time to merge any PRs though.

I guess you have a point there, seeing as after many months of waiting we finally have a comment on this PR from someone with real involvement in Ollama - see https://github.com/ollama/ollama/pull/5059#issuecomment-2628... . Of course this is very welcome news.


It's not really welcome news, he is just saying they're putting it on the long finger because they think other stuff is more important. He's the same guy that kept ignoring the KV cache quant merge.

And the actual patch is tiny..

I think it's about time for a bleeding-edge fork of ollama. These guys are too static and that is not what AI development is all about.


He specifically says that they're reworking the Ollama server implementation in order to better support other kinds of models, and that such work has priority and is going to be a roadblock for this patch. This is not even news to those who were following the project, and it seems reasonable in many ways - users will want Vulkan to work across the board if it's made available at all, not for it to be limited to the kinds of models that exist today.


That qkv PR was mine! Small world.


>So where's the non-sketchy, non-for-profit equivalent

llama.cpp, kobold.cpp, oobabooga, llmstudio, etc. There are dozens at this point.

And while many chalk the attachment to ollama up to a "skill issue", that's just venting frustration that all something has to do to win the popularity contest is to repackage and market it as an "app".

I prefer first-party tools, I'm comfortable managing a build environment and calling models using pytorch, and ollama doesn't really cover my use cases, so I'm not it's audience. I still recommend it to people who might want the training wheels while they figure out how not-scary local inference actually is.


> llmstudio

ICYMI, you might want to read their terms of use:

https://lmstudio.ai/terms


> llama.cpp, kobold.cpp, oobabooga

None of these three are remotely as easy to install or use. They could be, but none of them are even trying.

> lmstudio

This is a closed source app with a non-free license from a business not making money. Enshittification is just a matter of when.


I would argue that kobold.cpp is even easier to use than Ollama. You click on the link in the README to download an .exe and doubleclick it and select your model file. No command line involved.

Which part of the user experience did you have problems with when using it?


You’re coming at it from a point of knowledge. Read the first sentence of the Ollama website against the first paragraph of kobold’s GitHub. Newcomers don’t have a clue what “running a GGUF model..” means. It’s written by tech folk without an understanding of the audience.


Ollama is also written for technical/developer users, by accident (it seems), even though they don't want it to be strictly for technical users. I've opened a issue asking them to make it more clear that Ollama is for technical users, but they seem confident people with no terminal experience can and will also use Ollama: https://github.com/ollama/ollama/issues/7116


Why do you care? They're the ones who will deal with the support burden of people who don't understand how to use it—if that support burden is low enough that they're happy with where they're at, what motivation do you have to tell them to deliberately restrict their audience?


> Why do you care?

Like many in FOSS I care about making the experience better for everyone. Slightly weird question, why do you care that I care?

> what motivation do you have to tell them to deliberately restrict their audience?

I don't have any motivation to say any such thing, and I wouldn't either. Is that really your take away from reading that issue?

Stating something like "Ollama is a daemon/cli for running LLMs in your terminal" on your website isn't a restriction whatsoever, it's just being clear up front what the tool is. Currently, the website literally doesn't say what Ollama actually is.


> Is that really your take away from reading that issue?

Yes. You went to them with a definition of who they're trying to serve and they wrote back that they didn't agree with your relatively narrow scope. Now you're out in random threads about Ollama complaining that they didn't like your definition of their target audience.

Am I missing something?


Yes, I'm not asking them to adopt what I think is the target audience, I'm asking them to define any target audience, then add at least one sentence on their website describing what Ollama is, not sure why that's controversial.

Basically the only information on the website right now is "Get up and running with large language models.", do you think that's helping people? Could mean anything.


It’s so hard to decipher the complaints about ollama in this comment section. I keep reading comments from people saying they don’t trust it, but then they don’t explain why they don’t trust it and don’t answer any follow up questions.

As someone who doesn’t follow this space, it’s hard to tell if there’s actually something sketchy going on with ollama or if it’s the usual reactionary negativity that happens when a tool comes along and makes someone’s niche hobby easier and more accessible to a broad audience.


they don’t explain why they don’t trust it

We need to know a few things:

1) Show me the lines of code that log things and how it handles temp files and storage.

2) No remote calls at all.

3) No telemetry at all.

This is the feature list I would want to begin trusting. I use this stuff, but I also don’t trust it.


Both ollama and llama.cpp are open source. You can check the code for both and compile both yourself.

The question is: Why is ollama considered “sketchy” but llama.cpp is not, given that both are open source?

I’m not trying to debate it. I’m trying to understand why people are saying this.


The code is literally open source with MIT license to boot https://github.com/ollama/ollama


Oh.


>So where's the non-sketchy, non-for-profit equivalent?

Serving models is currently expensive. I'd argue that some big cloud providers have conspired to make egress bandwidth expensive.

That, coupled with the increasing scale of the internet, make it harder and harder for smaller groups to do these kinds of things. At least until we get some good content addressed distributed storage system.


> Serving models is currently expensive. I'd argue that some big cloud providers have conspired to make egress bandwidth expensive.

Cloudflare R2 has unlimited egress, and AFAIK, that's what ollama uses for hosting quantized model weights.


supporting vulkan will help ollama reach the masses who don't have dedicated gpus from nvidia.

this is such a low hanging fruit that it's silly how they are acting.


As has been pointed out in this thread in a comment that you replied to (so I know you saw it) [0], Ollama goes to a lot of contortions to support multiple llama.cpp backends. Yes, their solution is a bit of a hack, but it means that the effort to adding a new back end is substantial.

And again, they're doing those contortions to make it easy for people. Making it easy involves trade-offs.

Yes, Ollama has flaws. They could communicate better about why they're ignoring PRs. All I'm saying is let's not pretend they're not doing anything complicated or difficult when no one has been able to recreate what they're doing.

[0] https://news.ycombinator.com/item?id=42886933


This is incorrect. The effort it took to enable Vulkan was relatively minor. The PR is short and to be honest it doesn't do much, because it doesn't need to.


that PR doesn't actually work though -- it finds the Vulkan libraries and has some memory accounting logic, but the bits to actually build a Vulkan llama.cpp runner are not there. I'm not sure why its author deems it ready for inclusion.

(I mean, the missing work should not be much, but it still has to be done)


the pr was working 6 months ago and it has been rebased multiple times as the ollama team kept ignoring it and mainline moved. I'm using it right now.


This is a change from your response to the comment that I linked to, where you said it was a good point. Why the difference?

Maybe I should clarify that I'm not saying that the effort to enable a new backend is substantial, I'm saying that my understanding of that comment (the one you acknowledged made a good argument) is that the maintenance burden of having a new backend is substantial.


I didn't say it was a good point. I said I disagree, but it's a respectable opinion I could imagine someone having.


Okay, now we're playing semantics. "Reasonable argument" were your words. What changed between then and now to where the same argument is now "incorrect"?


I literally start the sentence with 'I disagree with this'


But you never said why, and you never said it was incorrect, you said it was a reasonable argument and then appealed to the popularity of the PR as the reason why you disagree.

But now suddenly what I said is not just an argument you disagree with but is also incorrect. I've been genuinely asking for several turns of conversation at this point why what I said is incorrect.

Why is it incorrect that the maintenance burden of maintaining a Vulkan backend would be a sufficient explanation for why they don't want to merge it without having to appeal to some sort of conspiracy with Nvidia?


llama.cpp already supports Vulkan. This is where all the hard work is at. Ollama hardly does anything on top of it to support Vulkan. You just check if the libraries are available, and get the available VRAM. That is all. It is very simple.



Llamafile is great but solves a slightly different problem very well: how do I easily download and run a single model without having any infrastructure in place first?

Ollama solves the problem of how I run many models without having to deal with many instances of infrastructure.


You don't need any infrastructure for llamafiles, you just download and run them (everywhere).


Yes, that's what I meant, sorry if it was confusing: The problem that Llamafiles solve is making it easy to set up one model without infrastructure.


It's actually more difficult to use on linux (compared to ollama) because of the weird binfmt contortions you have to go through.


What contortions? None of my machines needed more than `chmod +x` for llamafile to run.


I think you are missing the point. To get things straight: llama.cpp is not hard to setup and get running. It was a bit of a hassle in 2023 but even then it was not catastrophically complicated if you were willing to read the errors you were getting. People are dissatisfied for two, very valid reasons: ollama gives little to no credit to llama.cpp. The second one is the point of the post: a PR has been open for over 6 months and not a huge PR at that has been completely ignored. Perhaps the ollama maintainers personally don't have use for it so they shrugged it off but this is the equivalent of "it works on my computer". Imagine if all kernel devs used Intel CPUs and ignored every non-intel CPU-related PR. I am not saying that the kernel mailing list is not a large scale version of a countryside pub on a Friday night - it is. But the maintainers do acknowledge the efforts of people making PRs and do a decent job at addressing them. While small, the PR here is not trivial and should have been, at the very least, discussed. Yes, the workstation/server I use for running models uses two Nvidia GPU's. But my desktop computer uses an Intel Arc and in some scenarios, hypothetically, this pr might have been useful.


> To get things straight: llama.cpp is not hard to setup and get running. It was a bit of a hassle in 2023 but even then it was not catastrophically complicated if you were willing to read the errors you were getting.

It's made a lot of progress in that the README [0] now at least has instructions for how to download pre-built releases or docker images, but that requires actually reading the section entitled "Building the Project" to realize that it provides more than just building instructions. That is not accessible to the masses, and it's hard for me to not see that placement and prioritization as an intentional choice to be inaccessible (which is a perfectly valid choice for them!)

And that's aside from the fact that Ollama provides a ton of convenience features that are simply missing, starting with the fact that it looks like with llama.cpp I still have to pick a model at startup time, which means switching models requires SSHing into my server and restarting it.

None of this is meant to disparage llama.cpp: what they're doing is great and they have chosen to not prioritize user convenience as their primary goal. That's a perfectly valid choice. And I'm also not defending Ollama's lack of acknowledgment. I'm responding to a very specific set of ideas that have been prevalent in this thread: that not only does Ollama not give credit, they're not even really doing very much "real work". To me that is patently nonsense—the last mile to package something in a way that is user friendly is often at least as much work, it's just not the kind of work that hackers who hang out on forums like this appreciate.

[0] https://github.com/ggerganov/llama.cpp


llama.ccp is hard to set up - I develop software for a living and it wasn’t trivial for me. ollama I can give to my non-technical family members and they know how to use it.

As for not merging the PR - why are you entitled to have a PR merged? This attitude of entitlement around contributions is very disheartening as oss maintainer - it’s usually more work to review/merge/maintain a feature etc than to open a PR. Also no one is entitled to comments / discussion or literally one second of my time as an OSS maintainer. This is imo the cancer that is eating open source.


> As for not merging the PR - why are you entitled to have a PR merged?

I didn’t get entitlement vibes from the comment; I think the author believes the PR could have wide benefit, and believes that others support his position, thus the post to HN.

I don’t mean to be preach-y; I’m learning to interpret others by using a kinder mental model of society. Wish me luck!


ramalama seems to be trying, it's a docker based approach.



> No pre-built binaries yet! Use cargo to try it out

Not an equivalent yet, sorry.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: