Humans can carve the world up into domains with a fixed set of rules and then do symbolic reasoning within it. LLMs can't see to do this in a formal way at all -- they just occasionally get it right when the domain happens to be encoded in their language learning.
You can't feed an LLM a formal language grammar (e.g. SQL) then have it only generate results with valid syntax.
It's awfully confusing to me that people think current LLMs (or multi-modal models etc) are "close" to AGI (for whatever various definitions of all those words you want to use) when they can't do real symbolic reasoning.
Though I'm not an expert and happy to be corrected...
Adult humans can do symbolic reasoning, but lower mammals cannot. Even ones that share most of our brain structure are much worse at this, if they can do it at all; children need to learn it, along with a lot of the other things that we consider a natural part of human intelligence.
That all points towards symbolic reasoning being a pretty small algorithmic discovery compared to the general ability to pattern match and do fuzzy lookups, transformations, and retrievals against a memory bank. It's not like our architecture is so special that we burned most of our evolutionary history selecting for these abilities, they're very recent innovations, and thus must be relatively simple, given the existence of the core set of abilities that our close ancestors have.
The thing about transformers is that obviously they're not the end of the line, there are some things they really can't do in their current form (though it's a smaller set than people tend to think, which is why the Gary Marcuses of the world always backpedal like crazy and retcon their previous statements as each new release does things that they previously said were impossible). But they are a proof of concept showing that just about the simplest architecture that you could propose that might be able to generate language in a reasonable way (beyond N-gram sampling) can, in fact, do it really, really well even if all you do is scale it up, and even the simplest next-token prediction as a goal leads to much higher level abilities than you would expect. That was the hard core of the problem, building a flexible pattern mimic that can be easily trained, and it turns out to get us way further along the line to AGI than I suspect anyone working on it ever expected it would without major additions and changes to the design. Now it's probably time to start adding bits and bobs and addressing some of the shortcomings (e.g. static nature of the network, lack of online learning, the fact that chains of thought shouldn't be constrained to token sequences, addressing tokenization itself, etc), but IMO the engine at the heart of the current systems is so impressively capable that the remaining work is going to be less of an Einstein moment and more of an elbow grease and engineering grind.
We may not be close in the "2 years of known work" sense, but we're certainly not far in the "we have no idea how to prove the Riemann Hypothesis" sense anymore, where major unknown breakthroughs are still required which might be 50+ years away, or the problem might even be unsolvable.
You can't feed an LLM a formal language grammar (e.g. SQL) then have it only generate results with valid syntax.
It's awfully confusing to me that people think current LLMs (or multi-modal models etc) are "close" to AGI (for whatever various definitions of all those words you want to use) when they can't do real symbolic reasoning.
Though I'm not an expert and happy to be corrected...