Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Mixtral > Codellama > DeepSeek Coder. Very weird model, writes super long comments on one line, definitely not at the level of Codellama, benchmarks be damned.


Which mixtral are you using with ollama? I have 32GB M1 MacBook Pro and can’t seem to load it / get any time-realistic responses


tangent: I often have a hard time disambiguating the ">" in comparisons like yours: (A) greater than (ie, Mixtral superior to DeepSeek, w/ Codellama in between) vs (B) arrow/sequence (ie, start w/ Mixtral, progress to Codellama, finally land on DeepSeek as the culmination).

I'd love to hear of a less ambiguous way to represent these.


I personally use a -> b -> c for sequence, a > b >> c for nested hierarchy.

In this context, I read it as a is better than b which is better than c.


How does

Mixtral >= Codellama >= DeepSeek

Work for you?


Sadly that implies that all 3 could be equal or very close to equal.


Which they can be, when you ask them for a console.log. It does, however, get rid of the ambiguity of case (B) that GP mentioned.


Just curious, why do you think your evaluation is the opposite of what is shown in this evaluation? https://evalplus.github.io/leaderboard.html


Not OP but these evaluations should be taken with a huge grain of salt. It's almost impossible to rule out data leakage of open datasets such as HumanEval.


Mixtral looks interesting, but I haven't dabbled in locally hosted LLMs.

Would you mind linking to a concise text which could lead me through setting up Mixtral on my own machine?


It took me just as long to setup llama.cpp as it did to get other tools working well (ollama or other frontend that abstract away the actual config)

It’s always read HOWTO, attempt to recreate state, so I prefer sticking with low level where I also learn a bit more about the internals

C/C++ user friendliness has come as far as all the other languages and the ecosystems. Really the only reason to “fear” it is propagated memes to do so. It’s not a gun.

So I’d suggest just compile llama.cpp and install huggingface-cli to download GGUF format models, which is all ollama is doing but with even more dependencies and much more opaque outcome


LM Studio is the easiest way to do it


That's what I've been playing with. I can load 9 layers of a mixtral descendant into the 12gb vram for GPU and the rest into ~28gb ram for the CPU to work on. It chugs the system sometimes but the models are interestingly capable.


I'm using Mixtral, but rather than shell out for a gaming laptop with an expensive GPU, I simply run it via Together.ai APIs which works out alot cheaper. There's a few similar services out there.


Had zero experience, too. Turns out ollama does everything, literally. You just tell it to run a model and wait a bit for it to download. One (1) shell command total.


Ollama gui in WSL2


Agreed. Mixtral is freaky good. I have one 3090 and it flies.

DSC on the other hand crawls its way to poor answers and injected snippets of unrelated code into output for me.


do you find Mixtral to be better than the new 70B one that Meta released a couple days back as well?


did a comparison on LM Studio - the answers are eerily similar, but Mixtral is way way faster. Codellama-70B is slow to the point of being unusable.

(M1 Max, 64 GB RAM)


Is Mixtral better overall or for coding specifically (or both)?


Reduce the repetition penalty to 1 and that should fix it.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: