But also for $36.83 compared to DeepSeek R1 + claude-3-5 it's $13.29 and for lat...

tw1984 · 2025-02-25T05:23:20 1740461000

is there any public info on why such DeepSeek R1 + claude-3-5 combo worked better than using a single model?

alienthrowaway · 2025-02-25T06:36:37 1740465397

Sonnet 3.5 is the best non-Chain-of-Thought code-authoring model. When paired with R1's CoT output, Sonnet 3.5 performs even better - outperforming vanilla R1 (and eveything else), which suggests Sonnet is better than R1 at utilizing R1's CoT.

It's scenario where the result is greater than the sum of it's parts

Ballas · 2025-02-25T06:08:52 1740463732

From my experiments with the Deepseek Qwen-32b distill model, the Deepseek model did not follow the edit instructions - the format was wrong. I know the distill models are not at all the same as the full model, but that could provide a clue. Combine that information with the scores, then you have a reasonable hypothesis.

re-thc · 2025-02-25T07:43:31 1740469411

> I know the distill models are not at all the same as the full model

It's far worse than that. It's not the model (Deepseek) at all. It's Qwen enhanced with Deepseek. So it's Qwen still.

WiSaGaN · 2025-02-25T09:08:29 1740474509

My personal experience is that R1 is smarter than 3.5 sonnet, but 3.5 sonnet is a better coder. Thus it may be better to let R1 to tackle the problem, but let 3.5 sonnet to implement the solution.

pythonaut_16 · 2025-02-25T13:14:13 1740489253

Specialization of AI models is cool. Just like some people might be better planners and some are better at raw coding ability.