Hacker News new | past | comments | ask | show | jobs | submit login

But also for $36.83 compared to DeepSeek R1 + claude-3-5 it's $13.29 and for latter "Percent using correct edit format" is 100% vs 97.8% for 3.7.

edit: would be interesting to see how combo DeepSeek R1 + claude-3-7 performs.




is there any public info on why such DeepSeek R1 + claude-3-5 combo worked better than using a single model?


Sonnet 3.5 is the best non-Chain-of-Thought code-authoring model. When paired with R1's CoT output, Sonnet 3.5 performs even better - outperforming vanilla R1 (and eveything else), which suggests Sonnet is better than R1 at utilizing R1's CoT.

It's scenario where the result is greater than the sum of it's parts


From my experiments with the Deepseek Qwen-32b distill model, the Deepseek model did not follow the edit instructions - the format was wrong. I know the distill models are not at all the same as the full model, but that could provide a clue. Combine that information with the scores, then you have a reasonable hypothesis.


> I know the distill models are not at all the same as the full model

It's far worse than that. It's not the model (Deepseek) at all. It's Qwen enhanced with Deepseek. So it's Qwen still.


My personal experience is that R1 is smarter than 3.5 sonnet, but 3.5 sonnet is a better coder. Thus it may be better to let R1 to tackle the problem, but let 3.5 sonnet to implement the solution.


Specialization of AI models is cool. Just like some people might be better planners and some are better at raw coding ability.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: