And thus we have the AI problems in a nutshell. You think it can reason because it can describe the process in well written language. Anyone who can state the below reasoning clearly "understands" the problem:
> For example, in the top‐left 3×3 block (rows 1–3, columns 1–3) the givens are 7, 5, 9, 3, and 4 so the missing digits {1,2,6,8} must appear in the three blank cells. (Later, other intersections force, say, one cell to be 1 or 6, etc.)
It's good logic. Clearly it "knows" if it can break the problem down like this.
Of course if we stretch ourselves slightly to actually check beyond a quick visual inspection you'd quickly see it actually put a second 4 in that first box despite "knowing" it shouldn't. In fact several of the boxes have duplicate numbers, despite the clear reasoning aboving.
Does the reasoning just not get used in the solving part? Or maybe a machine built to regurgitate plausible text, can also regurgitate plausible reasoning?
Thanks for spotting this. The solution is indeed wrong. And I agree that the machine can regurgitate plausible reasoning in principle. If it run in a loop, I would bet that it could probably figure this particular problem out eventually, but not sure it matters much in the end. The only plausible way for some of these Sudoku puzzles is a SAT solver and I'm sure that if given the right environment an LLM could just code and execute one and get the answer. Does that mean it can't "reason" because it couldn't solve this Sudoku puzzle, or know that it made a mistake. I'm not sure I'd go this far, but I agree that my example didn't match my claim. The model didnt do a careful job and didn't quadruple check its work as I would have expected from an advanced AI, but remember that this is o3-mini, and not something that is supposed to be full-blown AI yet. If you asked GPT-3.5 for something similar the answer would have been amusingly simplistic, not it is at least starting to get close.
I now wonder if I had a typo when I copied this puzzle from an image to my phone app thus rendering it unsolveable.. the model should still have spotted such an error anyways but ofc it is not tuned to perfection
> For example, in the top‐left 3×3 block (rows 1–3, columns 1–3) the givens are 7, 5, 9, 3, and 4 so the missing digits {1,2,6,8} must appear in the three blank cells. (Later, other intersections force, say, one cell to be 1 or 6, etc.)
It's good logic. Clearly it "knows" if it can break the problem down like this.
Of course if we stretch ourselves slightly to actually check beyond a quick visual inspection you'd quickly see it actually put a second 4 in that first box despite "knowing" it shouldn't. In fact several of the boxes have duplicate numbers, despite the clear reasoning aboving.
Does the reasoning just not get used in the solving part? Or maybe a machine built to regurgitate plausible text, can also regurgitate plausible reasoning?