Context length possibly. Prompt adherence drops off with context, and anything a...

mvkel · 2025-02-01T00:40:24 1738370424

If it's a context issue, it's an issue with how cursor itself sends the context to these reasoning LLMs.

Context alone shouldn't be the reason that sonnet succeeds consistently, but others (some which have even bigger context windows) fail.

energy123 · 2025-02-01T00:43:42 1738370622

Yes, that's what I'm suggesting. Cursor is spamming the models with too much context, which harms reasoning models more than it harms non-reasoning models (hypothesis, but one that aligns with my experience). That's why I recommended testing reasoning models outside of Cursor with a hand curated context.

The advertised context length being longer doesn't necessarily map 1:1 with the actual ability the models have to perform difficult tasks over that full context. See for example the plots of performance on ARC vs context length for o-series models.