Good points. I suspect that o3 is able to reason more deeply about different pat...

westoncb · 2025-05-31T00:07:24 1748650044

I was blown away by some debugging results I got from o3 early on and have been using it heavily since. The early results that caught my attention were from a couple cases where it tracked down some problematic cause through several indirect layers of effects in a way where you'd typically be tediously tracing step-by-step through a debugger. I think whatever's behind this capability has some overlap with really solid work it'll do in abstract system design, particularly in having it think through distant implications of design choices.

notyourwork · 2025-05-31T14:26:57 1748701617

I’m interested in learning more about how you use o3 for debugging.

westoncb · 2025-05-31T20:18:25 1748722705

The main trick is in how you build up it's context for the problem. What I do is think of it like a colleague I'm trying to explain the bug to: the overall structure is conversational, but I interleave both relevant source chunks and detailed/complete observational info from what I've observed about anomalous program behavior. I typically will send a first message building up context about the program/source, and then build up the narrative context for particular bug in second message. This sets it up with basically perfect context to infer the problem, and sets you up for easy reuse: you can back up, clear that second message and ask something else, reusing detailed program context given by the first message.

Using it on the architectural side you can follow a similar procedure but instead of describing a bug you're describing architectural revisions you've gone through, what your experience with each was, what your objectives with a potential refactor are, where your thinking's at as far as candidate reformulations, and so on. Then finish with a question that doesn't overly constrain the model; you might retry from that conversation/context point with a few variants, e.g.: "what are your thoughts on all this?" or "can you think of better primitives to express the system through?"

I think there are two key points to doing this effectively:

1) Give it full, detailed context with nothing superfluous, and express it within the narrative of your real world situation.

2) Be careful not to "over-prescribe" what it says back to you. They are very "genie-like" where it'll often give exactly what you ask for in a rather literal sense, in incredibly dumb-seeming ways if you're not careful.

MangoToupe · 2025-05-31T06:03:19 1748671399

In the context of LLMs, what do you mean by "reason"? What does reasoning look like in LLMs and how do you recognize it, and more importantly, how do you invoke it? I haven't had much success in getting LLMs to solve, well, basically any problem that involves logic.

Chain of thought at least introduces some skepticism, but that's not exactly reasoning. It makes me wonder what people refer to when they say "reason".

therealpygon · 2025-05-31T08:43:37 1748681017

As best as I have understood, the LLMs output is directly related to the state of the network as a result of the context. Thinking is the way we use intermediate predictions to help steer the network toward a what is expected to be a better result through learned patterns. Reasoning are strategies for shaping that process to produce even more accurate output, generally having a cumulative effect on the accuracy of predictions.

MangoToupe · 2025-05-31T14:19:36 1748701176

> Reasoning are strategies for shaping that process to produce even more accurate output

How can it evaluate accuracy if it can't even detect contradictions reliably?

therealpygon · 2025-05-31T21:52:41 1748728361

It doesn’t? Reasoning is not an analysis; it is the application of learned patterns for a given set of parameters that results in higher accuracy.

Permit my likely inaccurate illustration: You’re pretty sure 2 + 2 is 4, but there are several questions you could ask: are any of the numbers negative, are they decimals, were any numbers left out? Most of those questions are things you’ve learned to ask automatically, without thinking about it, because you know they’re important. But because the answer matters, you check your work by writing out the equation. Then, maybe you verify it with more math; 4 ÷ 2 = 2. Now you’re more confident the answer is right.

An LLM doesn’t understand math per se. If you type “2 + 2 =”, the model isn’t doing math… it’s predicting that “4” is the next most likely token based on patterns in its training data.

“Thinking” in an LLM is like the model shifting mode and it starts generating a list of question-and-answer pairs. These are again the next most likely tokens based on the whole context so far. “Reasoning” is above that: a controlling pattern that steers those question-and-answer sequences, injecting logic to help guide the model toward a hopefully more correct next token.

suddenlybananas · 2025-05-31T06:08:54 1748671734

People think an approximation of a thing is the thing.

therealpygon · 2025-05-30T23:24:51 1748647491

Very likely. Larger context is significantly beneficial to the LLMs when they can maintain attention, which was part of my point. Imagine being able to hold the word for word text of your required reading book while you are taking a test, while older models were more like a couple chapters worth of text. Two years ago.