Mixtral > Codellama > DeepSeek Coder. Very weird model, writes super long comments on one line, definitely not at the level of Codellama, benchmarks be damned.
tangent: I often have a hard time disambiguating the ">" in comparisons like yours:
(A) greater than (ie, Mixtral
superior to DeepSeek, w/ Codellama in between)
vs
(B) arrow/sequence (ie, start w/ Mixtral, progress to Codellama, finally land on DeepSeek as the culmination).
I'd love to hear of a less ambiguous way to represent these.
Not OP but these evaluations should be taken with a huge grain of salt. It's almost impossible to rule out data leakage of open datasets such as HumanEval.
It took me just as long to setup llama.cpp as it did to get other tools working well (ollama or other frontend that abstract away the actual config)
It’s always read HOWTO, attempt to recreate state, so I prefer sticking with low level where I also learn a bit more about the internals
C/C++ user friendliness has come as far as all the other languages and the ecosystems. Really the only reason to “fear” it is propagated memes to do so. It’s not a gun.
So I’d suggest just compile llama.cpp and install huggingface-cli to download GGUF format models, which is all ollama is doing but with even more dependencies and much more opaque outcome
That's what I've been playing with. I can load 9 layers of a mixtral descendant into the 12gb vram for GPU and the rest into ~28gb ram for the CPU to work on. It chugs the system sometimes but the models are interestingly capable.
I'm using Mixtral, but rather than shell out for a gaming laptop with an expensive GPU, I simply run it via Together.ai APIs which works out alot cheaper. There's a few similar services out there.
Had zero experience, too. Turns out ollama does everything, literally. You just tell it to run a model and wait a bit for it to download. One (1) shell command total.