Hacker News new | past | comments | ask | show | jobs | submit login

It should be possible to learn to reason from scratch. And the ability to reason in a long context seems to be very general.



How does one learn reasoning from scratch?

Human reasoning, as it exists today, is the result of tens of thousands of years of intuition slowly distilled down to efficient abstract concepts like "numbers", "zero", "angles", "cause", "effect", "energy", "true", "false", ...

I don't know what reasoning from scratch would look like without training on examples from other reasoning beings. As human children do.


There are examples of learning reasoning from scratch with reinforcement learning.

Emergent tool use from multi-agent interaction is a good example - https://openai.com/index/emergent-tool-use/


Now you are asking for a perfect modeling of the system. Reinforcement learning works by discovering boundaries.


Now rediscover all the plants that are and aren't poisonous to most people.


I've suggested that long context should be included into the prompt.

In your particular case the prompt would look something like: <pubmed dump> what are the plants that aren't poisonous to most people?

A general reasoner would recover language and relevant world model from pubmed dump. And then would proceed to reason about it, to perform the task.

It doesn't look like a particularly efficient process.


Actually i also think it's possible. Start with natural numbers axiom system. Form all valid sentences of increasing length. RL on a model to search for counter example or proofs. This on sufficient computer should produce superhuman math performance (efficiency) even at compute parity


I wonder how much discovery in math happens as a result in lateral thinking epiphanies. IE: A mathematician is trying to solve a problem, their mind is open to inspiration, and something in nature, or their childhood or a book synthesizes with their mental model and gives them the next node in their mental graph that leads to a solution and advancement.

In an axiomatic system, those solutions are checkable, but how discoverable are they when your search space starts from infinity? How much do you lose by disregarding the gritty reality and foam of human experience? It provides inspirational texture that helps mathematicians in the search at least.

Reality is a massive corpus of cause and effect that can be modeled mathematically. I think you're throwing the baby out with the bathwater if you even want to be able to math in a vacuum. Maybe there is a self optimization spider that can crawl up the axioms and solve all of math. I think you'll find that you can generate new math infinitely, and reality grounds it and provides the gravity to direct efforts towards things that are useful, meaningful and interesting to us.


As I mentioned in a sister comment, Gödel's incompleteness theorems also throw a wrench into things, because you will be able to construct logically consistent "truths" that may not actually exist in reality. At which point, your model of reality becomes decreasingly useful.

At the end of the day, all theory must be empirically verified, and contextually useful reasoning simply cannot develop in a vacuum.


Those theorems are only relevant if "reasoning" is taken to its logical extreme (no pun intended). If reasoning is developed/trained/evolved purely in order to be useful and not pushed beyond practical applications, the question of "what might happen with arbitrarily long proofs" doesn't even come up.

On the contrary, when reasoning about the real world, one must reason starting from assumptions that are uncertain (at best) or even "clearly wrong but still probably useful for this particular question" (at worst). Any long and logic-heavy proof would make the results highly dubious.


A question is: what algorithms does the brain use to make these creative lateral leaps? Are they replicable?

Unless the brain is using physics that we don’t understand or can’t replicate, it seems that, at least theoretically, there should be a way to model what it’s doing with silicon and code.

States like inspiration and creativity seem to correlate in an interesting way with ‘temperature’, ‘top p’, and other LLM inputs. By turning up the randomness and accepting a wider range of output, you get more nonsense, but you also potentially get more novel insights and connections. Human creativity seems to work in a somewhat similar way.



I believe https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_... (Gödel's incompleteness theorems) applies here


Dogs are probably the best example I can think of. They learn through experience and clearly reason, but without a complex language to define abstract concepts. Its very basic reasoning, but they do learn and apply that learning.

To your point, experience is the training. Without language/data to represent human experience and knowledge to train a model, how would you give it 'experience'?


And yet dogs, to a very high degree, just learn the same things. At least the same kinds of things, over and over.

They were pre-designed to learn what they always learn. Their minds structured to readily make the same connections as puppies, that dogs have always needed to survive.

Not for real reasoning, which by its nature, does not have a limit.


> just learn the same things. At least the same kinds of things, over and over.

Its easy to train the same things to a degree, but its amazing to watch different dogs individually learn and reason through things completely differently, even within a breed or even a litter.

Reasoning ability is always limited by the capacity of the thinker to frame the concepts and interactions. Its always limited by definition, we only push that limit farther than other species, and AGI may eventually push it past our abilities.


There was necessarily a "first reasoning being" who learned reasoning from scratch, and then it's improved from there. Humans needed tens of thousands of years because:

- humans experience reality at a slower pace than AI could theoretically experience a simulated reality

- humans have to transfer knowledge to the next generation every 80 years (in a manner that's very lossy), and around half of each human lifespan is spent learning things that the previous generation already knew


The idea that there was “necessarily a first reasoning being” is neither obvious nor likely.

Reasoning could very well have originally been an emergent property of a group of beings.

The animal kingdom is full of examples of groups being more intelligent than individuals, including in human animals as of today.

It’s entirely possible that reasoning emerged as a property of a group before it emerged in any individual first.


I think you are focusing too much on the fact that a being needs to be an individual organism, which is kind of an implementation detail.

What I wonder instead is whether reasoning is a property that is either there or not there, with a sharp boundary of existence.


The dead organism cannot reason. It's simply a survivorship-bias. Reasoning evolved like any other survival mechanism.


Whether the first reasoning entity is an individual organism or a group of organisms is completely irrelevant to the original point. If one were to grant that there was in fact a "first reasoning group" rather than a "first reasoning being" the original argument would remain intact.


Did it kill them? y - must be unsafe n - must be safe

Do this continually through generations until you arrive at modern society.


Creating reasoning from scratch is the same task as creating an apple pie from scratch.

First you must invent the universe.


>First you must invent the universe.

That was the easy part though, figuring out how to handle all the unintended side effects it generated is still an ongoing process. Please sit and relax while we are solving the few incidentals events occurring here and there, rest assured we are putting our best effort to their resolution.


It is possible to learn to reason from scratch, that's what R1-0 did, but the resulting chains of thought aren't legible to humans.

To quote DeepSeek directly:

> DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. To address these issues and further enhance reasoning performance, we introduce DeepSeek-R1, which incorporates cold-start data before RL.


If you look at the benchmarks of the DeepSeek-V3-Base, it is quite capable, even in 0-shot: https://huggingface.co/deepseek-ai/DeepSeek-V3-Base#base-mod... This is not from scratch. These benchmark numbers are an indication that the base model already had a large number of reasoning/LLM tokens in the pre-training set.

On the other hand, my take on it, the ability to do reasoning in a long context is a general capability. And my guess is that it can be bootstrapped from scratch, without having to do training on all of the internet or having to distill models trained on the internet.


> These benchmark numbers are an indication that the base model already had a large number of reasoning/LLM tokens in the pre-training set.

But we already know that is the case: the Deepseek v3 paper says it was posttrained partly with an internal version of R1:

> Reasoning Data. For reasoning-related datasets, including those focused on mathematics, code competition problems, and logic puzzles, we generate the data by leveraging an internal DeepSeek-R1 model. Specifically, while the R1-generated data demonstrates strong accuracy, it suffers from issues such as overthinking, poor formatting, and excessive length. Our objective is to balance the high accuracy of R1-generated reasoning data and the clarity and conciseness of regularly formatted reasoning data.

And deepseekmath did a repeated cycle of this kind of thing mixing in 10% of old previously seen data with new generated data from last gen in a continuous bootstrap.


Possible? I guess evolution did it over the course of a few billion years. For engineering purposes, starting from the best advanced position seems far more efficient.


I've been giving this a lot of thought over the last few months. My personal insight is that "reasoning" is simply the application of a probabilistic reasoning manifold on an input in order to transform it into constrained output that serves the stability or evolution of a system.

This manifold is constructed via learning a decontextualized pattern space on a given set of inputs. Given the inherent probabilistic nature of sampling, true reasoning is expressed in terms of probabilities, not axioms. It may be possible to discover axioms by locating fixed points or attractors on the manifold, but ultimately you're looking at a probabilistic manifold constructed from your input set.

But I don't think you can untie this "reasoning" from your input data. It's possible you will find "meta-reasoning", or similar structures found in any sufficiently advanced reasoning manifold, but these highly decontextualized structures might be entirely useless without proper recontextualization, necessitating that a reasoning manifold is trained on input whose patterns follow learnable underlying rules, if the manifold is to be useful for processing input of that kind.

Decontextualization is learning, decomposing aspects of an input into context-agnostic relationships. But recontextualization is the other half of that, knowing how to take highly abstract, sometimes inexpressible, context-agnostic relationships and transform them into useful analysis in novel domains.

This doesn't mean a well-trained model can't reason about input it hasn't encountered before, just that the input needs to be in some way causally connected to the same laws which governed the input the manifold was trained on.

I'm sure we could create a fully generalized reasoning manifold which could handle anything, but I don't see how we possibly get that without first considering and encountering all possible inputs. But these inputs still have to have some form of constraint governed by laws that must be learned through sampling, otherwise you'd just be training on effectively random data.

The other commenter who suggested simply generating all possible sentences and training on internal consistency should probably consider Gödel's incompleteness theorems, and that internal consistency isn't enough to accurately model and interpret the universe. One could construct a thought experiment about an isolated brain in a jar with effectively unlimited neuronal connections, but no sensory connection to the outside world. It's possible, with enough connections, that the likelihood of the brain conceiving of true events it hasn't actually encountered does increase meaningfully. But the brain still has nothing to validate against, and can't simply assume that because something is internally logically consistent, that it must exist or have existed.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: