Hacker Newsnew | past | comments | ask | show | jobs | submit | robbrown451's commentslogin

I'm having trouble understanding what they want to "upskill" those people to do.

What skills won't be replaced? The only ones I can think of either have a large physical component, or are only doable by a tiny fraction of the current workforce.

As for the ones with a physical component (plumbers being the most cited), the cognitive parts of the job (the "skilled" part of skilled labor) can be replaced while having the person just following directions demonstrated onscreen for them. And of course, the robots aren't far behind, since the main hard part of making a capable robot is the AI part.


'main hard part of making a capable robot is the AI part'

Robots are far behind.

Mechanical hands with human equivalent performance is as hard as the AI part.

Strong, fast, durable, tough, touch and temp sensitive, dexterous, light, water-proof, energy efficient, non-overheating.

Muscles and tendons in human hands and forearms self-heal and grow stronger with more use.

Mechanical tendons stretch and break. Small motors have plenty of issues of their own.


And your claim is that those will never be solved?

As a professional robotics engineer I can tell you for a fact they are coming soon.


There's nothing in that post claiming those problems will never be solved. I understand the claim as "the hardware conponent of robotics needs more work and this will take some time, compared to AI capabilities/software" Or soemthing like that.

Maybe you could clarify what your experience on the matter is, how the state of th art looks to you, and most of all what timelines you imagine?


Just look up “fine dexterous manipulation with pressure feedback” to see the SOTA for dexterous manipulation

There’s at least a half dozen products, two recently from Unitree and Allegro announced.

Rodney Brooks wrote about the challenges - but frankly it was a submarine piece for his work

https://rodneybrooks.com/why-todays-humanoids-wont-learn-dex...


you are talking high cost emvironments, at least for the moment?

Come on... show me a robot that can run a farm that grows organic produce at an affordable price. It is the lowest wage job out there. Automating it would make prices far out of range for the 99% - but the billionaires could care less?


You need an AI to do that, affordable robots are already here but the intelligence is not.


For most things they don't need to be "human equivalent." I'd be willing to be the current crop of robots we're seeing could do most tasks like vacuuming, cooking, picking up clutter, folding laundry and putting it aways, making beds, touch up painting, gardening etc. It seems to be getting better very fast. And if mechanical tendons break, you replace them. Big deal. You don't even need a person to do the repair.


I don't think "replaced" is a good word here.. augmented and expanded. With AI we are expanding our activities, users expect more, competition forces companies to do more.

But AI can't be held liable for its actions, that is one role. It has no direct access to the context it is working in, so it needs humans as a bridge. In the end AI produce outcomes in the same local context, which is for the user. So from intent to guidance to outcomes they are all user based, costs and risks too.

I find it pessimistic to take that static view on work, as if "that's it, all we needed is invented", and now we are fighting for positions like musical chairs


> I don't think "replaced" is a good word here.. augmented and expanded. With AI we are expanding our activities, users expect more, competition forces companies to do more.

Daily reminder that the vast majority of value generated by productivity boost brought by technology in the last 50 years doesn't benefit the workers

https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcSG4s-x...


Agree for almost all jobs, but some, like my fathers, was about crawling inside huge metal pieces to do precision machining. For unique piecework, it might not be economical to train AI. Surely equivalents to this exist elsewhere


I'd suggest enjoying that vindication while it lasts.

From my perspective, your perspective is like a horse and buggy driver feeling vindicated when a "horseless carriage" driver accidentally drives one into a tree. The cars will get easier to drive and safer in crashes, and the drivers will learn to pay attention in certain ways they previously didn't have to.

Will there still be occasional problems? Sure, but that doesn't mean that tying your career to horses would have been a wise move. Same here.

(Also, this article is about "poisoned ChatGPT-like tools." Which says very little about using the tools that most developers are using)

I'm always reminded of this: "Logged onto the World Wide Web, I hunt for the date of the Battle of Trafalgar. Hundreds of files show up, and it takes 15 minutes to unravel them—one's a biography written by an eighth grader, the second is a computer game that doesn't work and the third is an image of a London monument. None answers my question, and my search is periodically interrupted by messages like, "Too many connections, try again later."" -- Cliff Stoll, 1995


Counter-analogy: This is ultimately people copy-pasting the first answer they see from a web-forum. That's been bad advice for decades, and the same underlying problems remain because most of them involve human foibles and allocating attention.

What these tools change is making the process much faster and adding a (rather questionable) imprimatur of quality from a vendor that may not actually be a good curator of code-samples.


Re: your apparent derision of Cliff Stoll's writings, the OP results seem to speak to a trend he was among the first to point out in the book you cite from: people overwhelmingly bias towards the easiest to obtain information, even when they know that information is of worse quality than other information that's available but harder to get.


It was cited from a Newsweek article, and Cliff said this about it later: "Of my many mistakes, flubs, and howlers, few have been as public as my 1995 howler ... Now, whenever I think I know what's happening, I temper my thoughts: Might be wrong, Cliff ..."

You may be right about humans biasing toward easiest to obtain information, but that doesn't say "don't use AI assistance", it says "use care when using AI assistance".

Also, Cliff wasn't saying the information was easier to use, since in his case, it was actually harder to use than just looking it up in a printed encyclopedia or the like. But none of the problems he mentioned were inherent problems with the internet, they were because it was a brand new medium still working out its kinks. AI may well be harder to use for coding right now, at least for many use cases. However, a look at the bigger picture strongly suggests it is the future, just as a look at the bigger picture in 1995 would have suggested that the internet was the future, at least for answering questions like "when was the battle of Trafalgar?"

This is consistent with my horse/car analogy: the car wasn't the problem, the problem was people who assumed cars were going to keep themselves on the road like a horse would naturally do. You can get a huge gain, but you have to be smart about how you use it.


Imagine if a regular for profit startup did that. It gets 60 million in initial funding, and later their valuation goes up to 100 billion. Of course they can't just give the 60 million back.

This is different and has a lot of complications that are basically things we've never seen before, but still, just giving the 60 million back doesn't make any sense at all. They would've never achieved what they've achieved without his 60 million.


I don't see how opening it makes it safer. It's very different from security things, where some "white hat" can find a security, and they can then fix it so instances don't get hacked. Sure, a bad person could run the software without fixing the bug, but that isn't going to harm anyone but themselves.

That isn't the case here. If some well meaning person discovers a way that you can create a pandemic causing superbug, they can't just "fix" the AI to make that impossible. Not if it is open source. Very different thing.


There are differences but there are also similarities. I think the similarities are more important, both when you're driving and interacting online, you have conflicting agendas, which could be a simple as when driving you're trying to get there as soon as possible, and when you are using an online message board you're either trying to get your point accepted or you trying to make yourself look good and smart.

The point, though, is that if you're not gonna have to interact with these people in the future, and there are otherwise no repercussions to being nasty, you're more likely to be nasty.


Any digital piano functions as a midi controller, and many of them have weighted keys. And there are a few "pure" midi controllers that have 88 weighted keys, such as the M-Audio Hammer 88 or StudioLogic SL88.


The fact that weighted digital pianos are cheaper than equivalent plain MIDI keyboards is my pet peeve.


I agree with Hinton, although a lot hinges on your definition of "understand."

I think to best wrap your head around this stuff, you should look to the commonalities of LLM's, image, generators, and even things like Alpha Zero and how it learned to play Go.

Alpha Zero is kind of the extreme in terms of not imitating anything that humans have done. It learns to play the game simply by playing itself -- and what they found is that there isn't really a limit to how good it can get. There may be some theoretical limit of a "perfect" Go player, or maybe not, but it will continue to converge towards perfection by continuing to train. And it can go far beyond what the best human Go player can ever do. Even though very smart humans have spent their lifetimes deeply studying the game, and Alpha Zero had to learn everything from scratch.

One other thing to take into consideration, is that to play the game of Go you can't just think of the next move. You have to think far forward in the game -- even though technically all it's doing is picking the next move, it is doing so using a model that has obviously looked forward more than just one move. And that model is obviously very sophisticated, and if you are going to say that it doesn't understand the game of Go, I would argue that you have a very, oddly restricted definition of the word, understand, and one that isn't particularly useful.

Likewise, with large language models, while on the surface, they may be just predicting the next word one after another, to do so effectively they have to be planning ahead. As Hinton says, there is no real limit to how sophisticated they can get. When training, it is never going to be 100% accurate in predicting text it hasn't trained on, but it can continue to get closer and closer to 100% the more it trains. And the closer it gets, the more sophisticated model it needs. In the sense that Alpha Zero needs to "understand" the game of Go to play effectively, the large language model needs to understand "the world" to get better at predicting.


The difference is that "the world" is not exhaustible in the same way as Go is. While it's surely true that the number of possible overall Go game states is extremely large, the game itself is trivially representable as a set of legal moves and rules. The "world model" of the Go board is actually just already exhaustive and finite, and the computer's work in playing against itself is to generate more varied data within that model rather than to develop that model itself. We know that when Alpha Zero plays a game against itself it is valuable data because it is a legitimate game which most likely represents a new situation it hasn't seen before and thus expands its capacity.

For an LLM, this is not even close to being the case. The sum of all human artifacts ever made (or yet to be made) doesn't exhaust the description of a rock in your front yard, let alone the world in all its varied possibility. And we certainly haven't figured out a "model" which would let a computer generate new and valid data that expands its understanding of the world beyond its inputs, so self-training is a non-starter for LLMs. What the LLM is "understanding", and what it is reinforced to "understand" is not the world but the format of texts, and while it may get very good at understanding the format of texts, that isn't equivalent to an understanding of the world.


>The sum of all human artifacts ever made (or yet to be made) doesn't exhaust the description of a rock in your front yard, let alone the world in all its varied possibility.

No human or creature we know of has a "true" world model so this is irrelevant. You don't experience the "real world". You experience a tiny slice of it, a few senses that is further slimmed down and even fabricated at parts.

To the bird who can intuitively sense and use electromagnetic waves for motion and guidance, your model of the world is fundamentally incomplete.

There is a projection of the world in text. Moreover training on additional modalities is trivial for a transformer. That's all that matters.


That's the difference though. I know my world model is fundamentally incomplete. Even more foundationally, I know that there is a world, and when my world model and the world disagree, the world wins. To a neural network there is no distinction. The closest the entire dynamic comes is the very basic annotation of RLHF which itself is done by an external human who is providing the value judgment, but even that is absent once training is over.

Despite not having the bird's sense for electromagnetic waves, I have an understanding that they are there, because humans saw behavior they couldn't describe and investigated, in a back-and-forth with a world that has some capacity to disprove hypotheses.

Additional modalities are really just reducible to more kinds of text. That still doesn't exhaust the world, and unless a machine has some ability to integrate new data in real time alongside a meaningful commitment and accountability to the world as a world, it won't be able to cope with the real world in a way that would constitute genuine intelligence.


>I know my world model is fundamentally incomplete. Even more foundationally, I know that there is a world, and when my world model and the world disagree, the world wins.

Yeah this isn't really true. There's not how humans work. For a variety of reasons, Plenty stick with their incorrect model despite the world indicating otherwise. In fact, this seems to be normal enough human behaviour. Everyone does it, for something or the other. You are no exception.

And yes LLMs can in fact tell truth from fiction.

GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975

Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334

Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets - https://arxiv.org/abs/2310.06824

Your argument seems to boil down to "they can't perform experiments" but that isn't true either.


It is a very basic fact that LLMs have no concept of true or false, it only has an ability to look up what text data it has seen before. If you do not understand this you are in no position to discuss LLMs.


I really don't know what people mean when they say this. We routinely instruct computer chips to evaluate whether some condition is true and take action on that basis, even though the chip is "just" a selectively doped rock. Why would the details of an LLM's underlying architecture mean that it can't have a concept of true or false?


One of the most ridiculous comments I have read about LLMs here.

The ~100 layer deep neural networks infer many levels of features over the text, including the concept of true and false. That is trivial for an LLM.

Are you completely unaware these are based on deep neural networks?

Convolutional Neural Networks don't operate by "look up" of text data.


Okay, so then tell me how does it decide whether it is true or false that Biden is the POTUS?

It's response is not based on facts about the world as it exists, but on the text data it has been trained on. As such, it is not able to determine true or false even if the response in the above example would be correct.


Serious question, in pursuit of understanding where you're coming from: in what way do you think that your own reckoning is fundamentally different to or more "real" than what you're describing above?

I know I don't experience the world as it is, but rather through a whole bunch of different signals I get that give me some hints about what the real world might be. For example, text.


You understand the concept of true vs false.

LLM does not, that isn't how it works.

You can say the difference is academic but there is a difference.

What is the difference between a real good faker of intelligence and actual intelligence is an open question.

But I will say most AI experts agree that LLM are not artificial general intelligence. It isn't just a lack of training data, they just are not of the category that we mean by that.


> You understand the concept of true vs false.

> LLM does not, that isn't how it works.

GPT-4 can explain the concept when prompted and can evaluate logic problems better than most human beings can. I would say it has a deeper understanding of "true vs false" than most humans.

I think what you are trying to say is that LLMs are not conscious. Consciousness has no precise universally agreed formal definition, but we all know that LLMs are not conscious.


> GPT-4 can explain the concept when prompted and can evaluate logic problems better than most human beings can. I would say it has a deeper understanding of "true vs false" than most humans.

Sigh

GPT produces output which obeys the patterns it has been trained on for definitions of true and false. It does not understand anything. It is a token manipulation machine. It does it well enough that it convinces you, a walking ape, that it understands. It does not.


A human is an ape that is obeying patterns that it has been trained on. What is school but a bunch of apes being trained to obey patterns? Some of these apes do well enough to convince you that it understands things. Some apes fully "understand" that flat earth theory is true, or they "understand" that the Apollo moon landings were faked.

You have a subjective philosophical disagreement about what constitutes understanding. That is fine. I clearly understand it is not conscious and that programs do not understand things the way that humans do. We are fundamentally different to LLMs. That is obvious. But you are not making a technical argument here unless you can define "understand" in technical terms. This is a matter of semantics.

> It is a token manipulation machine

Deep learning and machine learning in general is more than token manipulation. They are designed for pattern recognition.


You acknowledged above that consciousness isn't what LLM is and you likely understand that the poster was referring to that...

The broad strokes you use here are exactly why discussing LLMs are hard. Sure some people dismiss them because it isn't general AI but having supporters dismiss any argument with "passes the Turning test" is equally useless.


No you have misunderstood. As I wrote above:

"But you are not making a technical argument here unless you can define "understand" in technical terms. This is a matter of semantics."

I said the nature of their argument is not technical, since they are not dealing with technical definitions, but I did not dismiss their argument altogether. I clarified and restated their own argument for them in clearer terms. LLMs are not conscious, but they can still "understand" very well depending on your definition of understand. Understanding is not a synonym for consciousness. Language is evolving and you need to be more precise when discussing AI / machine learning.

One definition of understand is:

"perceive the intended meaning of (words, a language, or a speaker)."

Deep learning models recognize patterns. Mechanical perception of patterns. They understand things mechanically, unconsciously.


I stand by my point that people using synonyms for consciousness being told "LLM knows true better than humans do" is bad for discussion.

The core issue is their "knowledge" is too context sensitive.

Certainly humans are very context sensitive in our memories but we all have something akin to a "mental model" we can use to find things without that context.

In contrast LLM has knowledge defined by that context quite literally.

In either case my original point on using true and false is that LLM can hallucinate and on a fundamental design level there is little that can be done to stop it.


LLMs can outperform humans on a variety of NLP tasks that require understanding. Formally, they are designed to solve "natural language understanding" tasks as a subset of "natural language processing" tasks. The word "understanding" is used in the academic context here. It is a standard term in NLP research.

https://en.wikipedia.org/wiki/Natural-language_understanding

My point was to show that their thinking, reasoning and language was flawed, that it lacked nuance and rigor. I am trying to raise the standards of discussion. They need to think more deeply about what "understanding" really means. Consciousness does not even have a formal universally agreed definition.

Sloppy non-rigorous shallow arguments are bad for discussion.

> LLM can hallucinate and on a fundamental design level there is little that can be done to stop it.

That's a separate issue. They generally don't hallucinate when solving a problem within their context window. Recalling facts from their training set is another issue.

Humans sometimes have a similar problem of "hallucinating" when recalling facts from their long term memory.


Except that if you narrow to a tiny training set you are back to problems that can be solved almost as quickly with full text search...


Narrow to a tiny training set? What are you talking about now? That has nothing to do with deep learning.

GPT-3.5 was trained on at least 300 billion tokens. It has 96 layers in its neural network of 175 billion parameters. Each one of those 96 stacked layers has an attention mechanism that recomputes an attention score for every token in the context window, for each new token generated in sequence. GPT-4 is much bigger than that. The scale and complexity of these models is beyond comprehension. We're talking about LLMs, not SLMs.


I misread context window as training set and thought you were switching to SLMs. My mistake.


In order to affirm something is true, you don't just need to know it, you need to know that you know it. LLMs fundamentally have no self-knowledge.


> LLMs fundamentally have no self-knowledge

ChatGPT can tell me about itself when prompted. It tells me that it is an LLM. It can tell me about capabilities and limitations. It can describe the algorithms that generate itself. It has deep self knowledge, but is not conscious.


LLMs only knows it's text embeddings. It does not know the real world. Clear?


Humans and other creatures only know their sensory data input. Therefore they also don't know the real world.

Your eyes and ears perceive a tiny minuscule fraction of what is out there in the real world.

A blind and deaf person must know even less of the real world than an LLM, which can read more than a human can ever read in their lifetime.


It’s giving the most likely answer as opposed to the factual answer?


> It's response is not based on facts about the world as it exists, but on the text data it has been trained on

How did you find out that Biden was elected if not through language by reading or listening to news? Do you have extra sensory perception? Psychic powers? Do you magically perceive "facts" without any sensory input or communication? Ridiculous.

By the same argument your knowledge is also not based on "facts" about the world, since you only learned about it by reading or listening. Absurd nonsense.


You didn't answer my question ergo you concede that LLMs don't know true or false.


I did answer your question indirectly. By the reasoning in your argument, you yourself also don't know true or false. Your argument is logically flawed.

Do LLMs know true or false? It depends on how you define "know". By some definitions, they "know true or false" better than humans, as they can explain the concept and solve logic problems better than most humans can. However, by any definition that requires consciousness, they do not know because they are not conscious.

The average person spends a lot of time completely immersed in "false" entertainment. Actors are all liars, pretending to be someone they are not, doing things that didn't really happen, and yet many people are convinced it is all "true" for at least a few minutes.

People also believe crazy things like Flat Earth theory or that the Apollo moon landings were faked.

So LLMs have a conceptual understanding of true/false, strong logical problem solving to evaluate truth or falsity of logical statements, and factual understanding of what is true and false, better than many humans do. But they are not conscious therefore they are not conscious of what is true or false.


It certainly doesn't "look up" text data it has seen before. That shows a fundamental misunderstanding of how this stuff works. That's exactly why I use the example above of Alpha Zero and how it learns to play Go, since that demonstrates very clearly that it's not just looking things up.

And I have no idea what you mean by saying that it has no concept of true or false. Even the simplest computer programs have a concept of true or false, that's kind of the simplest data type, a boolean. Large language models have a much more sophisticated concept of true and false that has a lot more nuance. That's really a pretty ridiculous thing to say.


Yes, you don't understand what I said. The model has no concept of true or false. It only has embeddings. If 'asked' a question it can see if that is consistent with its embeddings and probabilities or not. This is not a representation of the real world, of facts, but simply a product of its training.


"This is not a representation of the real world, of facts, but simply a product of its training."

Tell me how that doesn't apply to the human brain as well.


They have no inherent concept of true or false, sure. But what are you comparing them to? It would be bold to propose that humans have some inherent concept of true or false in a way that LLMs do not; for both humans and LLMs it seems to be emergent.


In all these arguments its implied that this "genuine intelligence" is something humans all have, and nothing could be farther from the truth, that is why we have flat earthers or religious people and many other people beliving for decades easily refutable lies.


There is no such thing as a world model, and you don't have one of them. This is a leftover bad psychological concept from the 70s AI researchers who never got anywhere. People and other creatures do very little modeling things, they mostly just do stuff.


World model means inner representation of the external world. Any organism with a functioning brain has a world model. That's what brains do.

If you don't have a world model then you are a vegetable and could not be replying on HN.


If you close your eyes, how long can you navigate in the environment without hitting something? Not long, because you didn't model it.

If you're taking out the recycling, do you take the time to identify (model) each piece of it first? No, because that's not necessary.


Wait, you actually think we are talking about modelling as a conscious deliberate process in active working memory? Well there's your fundamental mistake. That is not what we are discussing, not even remotely.

The vast model in your brain is learned and generated unconsciously without your direct awareness.


No, I didn't say anything about doing it consciously. Motion is largely unconscious, like how you can throw things at a target without thinking about it.

But if you're just using it to mean "factual memory", calling it modeling seems like false precision.


Oh well in that case the answer is straightforward.

If you close your eyes and get lost after a few seconds, that's because that aspect of your model was not a 100% perfect exact replica of external reality that extended infinitely far in all spatial directions at all resolutions. For example, your internal spatial model is limited to some degree of accuracy and does not include the entire surface of Mars, but that doesn't mean that your model does not exist at all. Models are not perfect by definition. I thought this would be obvious.

Why would you think any model has to be a perfect exact 1:1 representation of the entire universe?

The model of reality in your head is a simplification that serves a purpose. Arbitrarily closing your fully functioning eyes is not something your model generating hardware was evolutionarily optimized for. Natural selection weeds out that kind of behaviour.

If you become blind then your model will change and optimize for other sensory inputs. Think of a blind man with a cane.


> For example, your internal spatial model is limited to some degree of accuracy and does not include the entire surface of Mars, but that doesn't mean that your model does not exist at all.

You're using "your model" as a metaphorical term here, but if you came up with any precise definition of the term here, it'd turn out to be wrong; people have tried this since the 50s and never gotten it correct. (For instance, is it actually a singular "a model" or is it different disconnected things you're using a single name for?)

See Phil Agre (1997) on exactly this idea: https://pages.gseis.ucla.edu/faculty/agre/critical.html

David Chapman (more general and current): https://metarationality.com/rationalism

and this guy was saying it in the 70s: https://en.wikipedia.org/wiki/Hubert_Dreyfus#Dreyfus'_critic...

> limited to some degree of accuracy

This isn't the only issue:

- You may not have observed something in the room in the right way for the action you need to do later.

- You might have observed it in a way you don't need later, which is a waste of time and energy.

- It might change while you're not looking.

- You might just forget it. (Since people do this, this must be an adaptive behavior - "natural selection" - but it's not a good thing in a model.)

> Why would you think any model has to be a perfect exact 1:1 representation of the entire universe?

What principle can you use to decide how precise it should be? (You can't do this; there isn't one.)

> The model of reality in your head is a simplification that serves a purpose.

Not only does it serve a purpose, your observations largely don't exist until you have a purpose for them.

RL agents tend to get stuck investigating irrelevant things when they try to maintain models; humans are built to actively avoid this with attention and boredom. Robot cameras take in their entire visual field and try to interpret it; humans both consciously and unconsciously actively investigate the environment as needed alongside deciding what to do. (Your vision is mostly fake; your eyes are rapidly moving around to update it only after you unconsciously pay attention to something.)

> Natural selection weeds out that kind of behaviour.

Not that well since something like half of Americans are myopic…


So basically you agree with what I was saying.

> What principle can you use to decide how precise it should be?

It is not up to me or anyone else to decide. Our subjective definitions and concepts of the model are irrelevant. How the brain works is a result of our genetic structure. We don't have a choice.


You can design a human if you want, that's what artificial intelligence is supposedly all about.

Anyway, read the paper I linked.


All of this was in response to your comment earlier:

"There is no such thing as a world model, and you don't have one of them."

There is such a thing as a world model in humans, and we all have them otherwise we could not think about or conceptualize or navigate the world. Then you have discussed how to define or construct a useful model or the limitations of a model but that is not relevant to the original point and I'm already aware of that.


I do agree, but more importantly love this part of the argument! Its when all the personality differences become too much to bear and suddenly people are accused of not even knowing themselves. Been there before, what a wild ride!


> suddenly people are accused of not even knowing themselves

It's not some desperate retort. People don't know themselves very well. Look at the research into confabulation, it seems to be standard operating procedure for human brains.


Kant would like a word with you about your point on whether people themselves understand the world and not just the format of their perceptions... :)

I think if you're going to be strict about this, you have to defend against the point of view that the same 'ding an sich' problem applies to both LLMs and people. And also whether if you had a limit sequence of KL divergences, one from a person's POV of the world, and one from an LLM's POV of texts, what it is about how a person approaches better grasp of reality - and likewise their KL divergence approaches 0, in some sense implying that their world model is becoming the same as the distribution of the world - that can only apply to people.

It seems possible to me that there is probably a great deal of lurking anthropocentrism that humanity is going to start noticing more and more in ourselves in the coming years, probably in both the direction of AI and the direction of other animals as we start to understand both better


The world on our plane of existence absolutely is exhaustible, just on a much, much larger scale. Doesn't mean that the process is fundamentally different, and for the human perspective there might be diminishing returns.


What if we are just the result of a ml network with a model of the world?


We're not.


LLMs are very good at uncovering the mathematical relationships between words, many layers deep. Calling that understanding is a claim about what understanding is. But because we know how the LLMs we're talking about at the moment are trained, it seems to have more problems:

LLMs do not directly model the world; they train on and model what people write about the world. It is an AI model of a computed gestalt human model of the world, rather than a model of the world directly. If you ask it a question, it tells you what it models someone else (a gestalt of human writing) is most likely say. That in turn is strengthened if user interaction accepts it and corrected only if someone tells it something different.

If we were to define that as what "understanding" is, we would equivalently be saying that a human bullshit artist would have expert understanding if only they produced more believable bullshit. (They also just "try to sound like an expert".)

Likewise, I'm not convinced that we can measure its understanding just by identifying inaccuracies or measuring the difference between its answers and expert answers - There would be no difference between bluffing your way through the interview (relying on your interviewer's limitations in how they interrogate you) and acing the interview.

There seems to be a fundamental difference in levels of indirection. Where we "map the territory", LLMs "map the maps of the territory".

It can be an arbitrarily good approximation, and practically very useful, but it's a strong ontological step to say one thing "is" another just because it can be used like it.


"LLMs do not directly model the world; they train on and model what people write about the world"

This is true. But human brains don't directly model the world either, they form an internal model based on what comes in through their senses. Humans have the advantage of being more "multi-modal," but that doesn't mean that they get more information or better information.

Much of my "modeling of the world" comes from the fact that I've read a lot of text. But of course I haven't read even a tiny fraction of what GPT4 has.

That said, LLMs can already train on images, as GPT4-V does. And the image generators as well do this, it's just a matter of time before the two are fully integrated. Later we'll see a lot more training on video and sound, and it all being integrated into a single model.


We could anthropomorphize any textbook too and claim it has human level understanding of the subject. We could then claim the second edition of the textbook understands the subject better than the first. Anyone who claims the LLM "understands" is doing exactly this. What makes the LLM more absurd though is the LLM will actually tell you it doesn't understand anything while a book remains silent but people want to pretend we are living in the Matrix and the LLM is alive.

Most arguments then descend into confusing the human knowledge embedded in a textbook with the human agency to apply the embedded knowledge. Software that extracts the knowledge from all textbooks has nothing to do with the human agency to use that knowledge.

I love chatGPT4 and had signed up in the first few hours it was released but I actually canceled my subscription yesterday. Part because of the bullshit with the company these past few days but also because it had just become a waste of time the past few months for me. I learned so much this year but I hit a wall that to make any progress I need to read the textbooks on the subjects I am interested in just like I had to this time last year before chatGPT.

We also shouldn't forget that children anthropomorphize toys and dolls quite naturally. It is entirely natural to anthropomorphize a LLM and especially when it is designed to pretend it is typing back a response like a human would. It is not bullshitting you though when it pretends to type back a response about how it doesn't actually understand what it is writing.


> One other thing to take into consideration, is that to play the game of Go you can't just think of the next move. You have to think far forward in the game -- even though technically all it's doing is picking the next move, it is doing so using a model that has obviously looked forward more than just one move.

It doesn't necessarily have to look ahead. Since Go is a deterministic game there is always a best move (or moves that are better than others) and hence a function that goes from the state of the game to the best move. We just don't have a way to compute this function, but it exists. And that function doesn't need the concept of lookahead, that's just an intuitive way of how could find some of its values. Likewise ML algorithms don't necessarily need lookahead, they can just try to approximate that function with enough precision by exploiting patterns in it. And that's why we can still craft puzzles that some AIs can't solve but humans can, by exploiting edge cases in that function that the ML algorithm didn't notice but are solvable with understanding of the game.

The thing is though, does this really matter if eventually we won't be able to notice the difference?


> It doesn't necessarily have to look ahead. Since Go is a deterministic game there is always a best move

Is there really a difference between the two? If a certain move shapes the opponent's remaining possible moves into a smaller subset, hasn't AlphaGo "looked ahead"? In other words, when humans strategize and predict what happens in the real world, aren't they doing the same thing?

I suppose you could argue that humans also include additional world models in their planning, but it's not clear to me that these models are missing and impossible for machine learning models to generate during training.


> If a certain move shapes the opponent's remaining possible moves into a smaller subset, hasn't AlphaGo "looked ahead"?

You're confusing the reason why a move is good with how you can find that move. Yeah, a move is good due to how it shapes the opponent remaining moves, and this is also the reasoning we make in order to find that move, but it doesn't mean you can only find that move by doing that reasoning. You could have found that move just by randomly picking one, it's not very probably but it's possible. AIs just try to maximize such probability of picking a good move, meanwhile we try to find a reason a move is good. IMO it doesn't make sense to try to fit the way AI do this into our mental model, since the middle goal is fundamentally different.


> Since Go is a deterministic game there is always a best move

The rules of the game are deterministic, but you may be going a step too far with that claim.

Is the game deterministic when your opponent is non-deterministic?

Is there an optimal move for any board state given that various opponents have varying strategies? What may be the best move against one opponent may not be the best move against another opponent.


Maybe "deterministic" is not the correct term here. What I meant is that there's no probability or unknown in the game, so you can always know what are the possible moves and the relative new state.

The opponent's moves may be considered non-deterministic, but you can just assume the worst case for you, that is the best case for the opponent, which is the opponent will always play the best move too.


At every point in time there are a range of moves with different levels of optimality. That range changes at the next point in time following the opponent's move.


The opponents strategy is an unknown variable not determined by the current board state.

Therefore the best move cannot be determined by the current board state, as it cannot be determined in isolation from the opponents strategy.


The optimal strategy can be determined from the current state. This is the principle behind minimax.

In a perfect information zero sum game, we can theoretically draw a complete game tree, each terminal node ending with a win, loss, or draw. With a full understanding of the game tree we can make moves to minimize our opponent’s best move.


I stand corrected. Thanks for that explanation.


> to play the game of Go you can't just think of the next move. You have to think far forward in the game -- even though technically all it's doing is picking the next move, it is doing so using a model that has obviously looked forward more than just one move.

While I imagine alpha go does some brute force and some tree exploration, I think the main "intelligent" component of alpha go is the ability to recognize a "good" game state from a "bad" game state based on that moment in time, not any future plans or possibilities. That pattern recognition is all it has once its planning algorithm has reached the leaves of the trees. Correct me if I'm wrong, but I doubt alpha go has a neural net evaluating an entire tree of moves all at once to discover meta strategies like "the opponent focusing on this area" or "the opponent feeling on the back foot."

You can therefore imagine a pattern recognition algorithm so good that it is able to pick a move by only looking 1 move into the future, based solely on local stone densities and structures. Just play wherever improves the board state the most. It does not even need to "understand" that a game is being played.

> while on the surface, they may be just predicting the next word one after another, to do so effectively they have to be planning ahead.

So I don't think this statement is necessarily true. "Understanding" is a major achievement, but I don't think it requires planning. A computer can understand that 2+2=4 or where to play in tic-tac-toe without any "planning".

That said, there's probably not much special about the concept of planning either. If it's just simulating a tree of future possibilities and pruning it based on evaluation, then many algorithms have already achieved that.


The "meta" here is just the probability distribution of stone densities. The only way it can process those is by monte Carlo simulation. The DNN (trained by reinforcement learning) evaluates the simulations and outputs the top move(s).


> As Hinton says, there is no real limit to how sophisticated they can get.

There’s no limit to how sophisticated a model can get, but,

1. That’s a property shared with many architectures, and not really that interesting,

2. There are limits to the specific ways that we train models,

3. We care about the relative improvement that these models deliver, for a given investment of time and money.

From a mathematical perspective, you can just kind of keep multiplying the size of your model, and you can prove that it can represent arbitrary complicated structures (like, internal mental models of the world). That doesn’t mean that your training methods will produce those complicated structures.

With Go, I can see how the model itself can be used to generate new, useful training data. How such a technique could be applied to LLMs is less clear, and its benefits are more dubious.


A big difference between a game like Go and writing text is that text is single player. I can write out the entire text, look at it and see where I made mistakes on the whole and edit those. I can't go back in a game of Go and change one of my moves that turned out to be a mistake.

So trying to make an AI that solves the entire problem before writing the first letter will likely not result in a good solution while also making it compute way too much since it solves the entire problem for every token generated. That is the kind of AI we know how to train so for now that is what we have to live with, but it isn't the kind of AI that would be efficient or smart.


This doesn't seem like a major difference, since LLMs are also choosing from a probability distribution of tokens for the most likely one, which is why they respond a token at a time. They can't "write out' the entire text at a time, which is why fascinating methods like "think step by step" work at all.


But it can't improve its answer after it has written it, that is a major limitation. When a human writes an article or response or solution, that is likely not the first thing the human thought of, instead they write something down and works on it until it is tight and neat and communicates just what the human wants to communicate.

Such answers will be very hard for an LLM to find, instead you mostly get very verbose messages since that is how our current LLM thinks.


Completely agree. The System 1/System 2 distinction seems relevant here. As powerful as transformers are with just next-token generation and context, which can be hacked to form a sort of short-term memory, some time of real-time learning + long-term memory storage seems like an important research direction.


> But it can't improve its answer after it has written it, that is a major limitation.

It can be instructed to study its previous answer and find ways to improve it, or to make it more concise, etc, and that is working today. That can easily be automated by LLMs talking to each other.


that is true and isnt. GPT4 has shown itself to halfway through a answer say "wait thats not correct im sorry let me fix that" and then correct itself. For example it stated a number was prime and why, and when showing the steps found it was divisible by 3 and said "oh i made a mistake it actually isnt prime"


> There may be some theoretical limit of a "perfect" Go player, or maybe not, but it will continue to converge towards perfection by continuing to train

I don’t think that’s a given. AlphaZero may have found an extremely high local optimum that isn’t the global optimum.

When playing only against itself, it won’t be able to get out of that local optimum, and when getting closer and closer to it even may ‘forget’ how to play against players that make moves that AplhaGo never would make, and that may be sufficient for a human to beat it (something like that happened with computer chess in the early years, where players would figure out which board positions computers were bad at, and try to get such positions on the board)

I think you have to keep letting it play against other good players (human or computer) that play differently to have it keep improving, and even then, there’s no guarantee it will find a global optimum.


Alphazero runs monte carlo tree search so it has a next move "planning" simulator. This computes the probability that specific moves up to some distance lead to a win.

LLMs do not have a "planning" module or simulator. There is no way the LLM can plan.

Could build a planning system into an LLM? Possibly and probably, but that is still open research. LeCunn is trying to figure out how to train them effectively. But even an LLM with a planning system does not make it AGI.

Some will argue that iteratively feeding the output embedding back into the input will retain the context but even in those cases it rapidly diverges or as we say "hallucinates"... still happens even with large input context windows. So there is still no planning here and no world model or understanding.


The issue with Alpha Zero analogy extremes is that those are extremely constrained conditions, so can't be generalized to something infinitely more complicated like speech

And

> When training, it is never going to be 100% accurate in predicting text it hasn't trained on, but it can continue to get closer and closer to 100% the more it trains.

For example, it can reach 25% of accuracy and have an math limit of 26%, so "forever getting closer to 100% with time" would still result in a waste of even infinite resources


> there isn't really a limit to how good it can get.

> it will continue to converge towards perfection

Then someone discovered a flaw that made it repeatably beatable by relative amateurs in a way that no human player would be

https://www.vice.com/en/article/v7v5xb/a-human-amateur-beat-...


It's not planning ahead, it is looking at the probabilities of the tokens altogether rather than one by one.


> You have to think far forward in the game -

I disagree. You can think in terms of a system that doesn't involve predictions at all, but has the same or similar enough outcome.

So an action network just learns patterns. Just like a chess player can learn what positions look good without thinking ahead.


Next word generation is one way to put it. The key point here is we have no idea what’s happening in the black box that is the neural network. It could be forming very strong connections between concepts in there with multi tiered abstractions.


It is certainly not abstracting things.


If LLMs are just glorified autocompletion, then humans are too!


> I would argue that you have a very, oddly restricted definition of the word, understand, and one that isn't particularly useful.

Is it just me or does this read like “here is my assumption about what you said, and now here is my passive aggressive judgement about that assumption”? If you’re not certain about what they mean by the word “understand”, I bet you could ask and they might explain it. Just a suggestion.


I've asked that question in the past and I've never gotten an answer. Some people sidestep the question by describing something or other that they're confident isn't understanding; others just decline to engage entirely, asserting that the idea is too ridiculous to take seriously. In my experience, people with a clear idea of what they mean by the word "understand" are comfortable saying that ML models understand things.


This is absolute nonsense. The game of Go is a grid and two colors of pieces. "The world" here is literally everything.


Well fully sentient doesn't mean it is superintelligent.


GP said "AGI", which means AI that's at least capable of most human cognitive tasks.

If you've got a computer that is equally competent as a human, it can easily beat the human because it has a huge speed advantage. In this imaginary scenario if the model only escaped to your MacBook Pro and was severely limited by computed power, it still got a chance.

If I was locked inside your MacBook Pro, I can think of a couple devious trick I could try. And I'm just a dumb regular human - way above median in my fields of expertise, and at or way below median on most other fields. An "AGI" would therefore be smarter and more capable.


And vice versa


I can't agree with the dismissiveness of this comment, and frankly I find its tone out of line and not with the spirit of Hacker News.

There are insights that can come from studying the brain, that do indeed apply. Some researchers may not glean anything from such studies, and some may. I have no doubt that as neural networks get more an more powerful, we will continue to find more ways they are similar to the brain, and apply things we've learned about the brain to them.

I certainly prefer to see people making comparisons of neural networks to the brain, that the old "it's just a glorified autocomplete" and the like.

Relax.


No one disagrees we might be able to discern insights if we understand how our brain is wired. The problem is the current state of neuroscience is so flawed in its approach it’s not looking like they’re of any use. They don’t even understand how a 900 neuron worms system works but are more than happy to tap half a billion dollars from unsuspecting politicians saying they’ll map the human connectome. Go read the brain initiative proposal [1] to see how out of touch with reality the scientists in this field are. I agree with OP that sharp criticism of the entire field is fully warranted.

1. https://braininitiative.nih.gov/sites/default/files/document...


what are you talking about is this konrad kording's shitposting alt??? this reeks of naivety

I certainly have many critiques of methods used in neuroscience rn (as a working neuroscientist) but to reduce those to the conclusion that the entire project of neuroscience is hopeless is absurd. We understand certain things quite well actually, and it's not at all obvious what "understanding" at a larger scale would look like. It is very possible that the brain is irreducibly complex, and that the model you would need to construct to describe it would itself be so complex as to be useless in providing insight. Considering that the brain is by far the most complex object in the universe I think we're doing pretty well.

Furthermore, there are quite a lot of disagreements about the utility of connectomics. Outside of the extremists (Sebastian Seung and his ilk) no one thinks that connectomics is going to be the key that brings earth shattering insight. It's just another tool. There is a complete connectome for part of the drosophila brain already (privately funded btw), which is in daily use in many fly labs. It tells you what other neurons are connected to. Incredibly useful. Not earth shattering.

also you might want to measure the neuroscience funding you deem wasteful up against the tens of billions NASA is spending to send humans (and not robots) back to the moon for "the spirit of adventure". cold war's over. robots will do just fine for the moon.


Can you please elaborate what great strides the field of neuroscience has made in the past 30 years?

From where I stand I can’t see anyone giving a clear explanation of anything our brain does or does not do in a disease. The only novel treatment that has come out seems to have been stick a rod into the brain and zap it and it just magically cures a lot of diseases we still don’t understand even a bit.

This is not even starting to discuss what little we have learned about how brains algorithms work. I’m still waiting to understand why pyramidal neurons were somehow groundbreaking. We found some neuron that fires when you walk to a place, why wouldn’t we find one?

And what are you saying about the fly connectome again? Do we have exact names for every neuron in the fly brain and its verified connectome for every neuron?

Last I checked the worm connectome has been available in intricate detail for decades and the scientists still haven’t had any proper decoding of the algorithms in that system. In fact I know every lab trying to figure that out now, I wrote proposals in the topic myself. Everyone else has apparently decided it’s not sexy enough to work with worms so they have just leaped to more complex systems with no basic understanding. I’m not the only one saying this. Sydney Brenner said as much in an editorial. But the field was too busy doing I don’t know what to listen.

Sydney, B. & Sejnowski, T. J. Understanding the human brain. Science 334, 567 (2011).

I remember sauntering to the occasional neuroscience talk during my ut southwestern PhD and occasionally hearing some professor brag about how the majority of one of their PhD’s jobs was to segmenting a single neuron in the thousand EM images or something. Surely that’s a sign this field needs revision?


> And what are you saying about the fly connectome again? Do we have exact names for every neuron in the fly brain and its verified connectome for every neuron?

onus isn't on me to justify the existence of an entire field to you. the claim that neuroscience has not made great strides in the last 30 years is an extraordinary one, and that's all on you. but it especially doesn't help your case that if you had googled "fly connectome " you would have seen that the first result is a complete connectome of a larvae and the third result is the tour de force from Janelia that produced an adult connectome. With names and verified connections. there is even a wikipedia article for the drosophila connectome!

> I remember sauntering to the occasional neuroscience talk during my ut southwestern PhD and occasionally hearing some professor brag about how the majority of one of their PhD’s jobs was to segmenting a single neuron in the thousand EM images or something. Surely that’s a sign this field needs revision?

and if you had gone on to actually read the hemibrain connectome paper you would have gained some appreciation for the gargantuan achievement that it was. it took hundreds of person years to generate ground truth segmenting neurons by hand, to develop the ML techniques required to automatically segment the rest (extremely difficult problem) and to then validate the automatic segmentations. not to mention the insane effort it was to acquire a half petabyte EM image of a single fly at sub-synaptic resolution in the first place.

I gotta hand it to you though, the position of naivety you've delivered your middlebrow dismissal from is truly impressive in magnitude.


Agreed. Reading the GP’s comment it feels like it’s from bizzaro world. It’s the computer scientists who have been claiming that neural networks resemble the human brain - they even fucking named them neural networks for christ’s sake! That could be excused as naive hubris in the 1980s, it’s utter delusion now.

A surface review of neuroplasticity literature alone should free anyone of the illusion that “neural networks” have even a passing resemblance to biological neurons, something covered in neuroscience 101 and is widely internalized by its practitioners. The BS grant writing and PR scientists have to participate in is hardly reflect of state of the art science itself.

The irony is that machine learning methods are a perfect fit for neuroscience and biology in general which generates reams of data that is largely so multidimensional that manual analysis is intractable. What we’re seeing now is the crest of the academic hype cycle which - if the history of bioinformatics is anything to go by - means that ML will take years if not decades for the field to understand and filly utilize.


Actually it was neuroscientists that developed the models nowadays used for machine learning. The McCulloch-Pitts neuron model introduced in 1943 which lead to Frank Rosenblatt's perceptron introduced in 1958. Machine learning algorithms mostly still use those models but computational neuroscience has progressed towards much more complicated neuronal models.


It's typical of the arrogant, borderline anti-scientific attitude of a non-negligible fraction of the HN hive mind, i.e. if it came out of academia it must be a waste of time.


As another working neuroscientist, thank you. And cheers.


No I think these comments are quite necessary. People need to stop making these comparisons because they have absolutely no grounding in how brains actually work. There are bad ideas that should be dismissed.


Neural networks are absolutely based on a very simplified model of how brains work. Specific NN architectures are in turn based on specific parts of the brain (e.g. Convolution Neural Networks are based on the visual cortices of cats/frogs).


nah, they're arbitrary function approximators that caught a lucky break. CNNs rose to prominence because natural scene statistics are translation invariant and convolutions can be efficiently computed on GPUs. and now that we have whole warehouses of GPUs, the current mood in DL is to stop building the symmetries of your dataset into the model (which is insane btw) and use brute force.

the tenuous connection DL once had to neuroscience (perceptrons) is a distant memory


A fabricated re-telling of the past, given that we didn't start using GPUs for this type of compute until the turn of the millenium.


If you want to talk about history, these things were invented using a 1950's understanding of neuroscience then promptly discarded until the ml people figured out how to make them useful.


AlexNet was the turning point for DL.


Why do you say that? Deep Learning was accelerating well before that (I would argue it has been accelerating for its entire existence).

AlexNet was a state-of-the-art image recognition net for a (relatively) brief amount of time. It wasn't the first CNN to use GPU acceleration, and it was quickly eclipsed in terms of ImageNet performance.

Regardless, I think bringing up AlexNet kinda invalidates your initial point. Although yes, it turns out that the two were a great match, CNNs and modern GPUs were clearly developed independently of each other, as evidenced by the many, many iterations of both before they were combined.


is this schmidhuber's alt? sure they existed before AlexNet was where it really took off. just look at the number of citations. right paper, right time. CNNs were uniquely suited to the hardware at the time. because of their efficiency due to symmetry and suitability to GPGPU computing. not because of their history.


You're saying the study has no grounding in how brains work? I'd think a more reasonable conclusion would be that the neuroscientists involved have no grounding in how artificial neural networks work.

It seems the whole point is to bring in additional details of how brains work, that the think may be relevant to artificial NNs.


Artificial neural networks are the closest working model of a brain we have today.

Lots of graph nodes, with weighted connections, performing distributed computation (mainly hierarchical pattern matching), learning from data by gradually updating weights, using selective attention (and/or recurrence, and/or convolutional filters).

Which of the above is not happening in our brains? Which of the above is not biologically inspired?

In fact this description equally applies to both a brain and GPT4.


Many organisms have just a handful of neurons yet exhibit complex behavior that would be impossible given the weighted connections model. Not to mention single-celled organisms that exhibit ability to navigate.

The model can be the closest working model but that doesn't mean it is complete. It's very likely that cells can store memories/information independent from weights.


We can’t do that not because our mathematical neurons are too simple. We can’t do that because we don’t know the algorithms those biological neurons are running.

Do you see the difference?


There is of course a difference between the two things you say. They're both the reason we can't recreate the brain in software though.


There are two separate goals: to simulate the brain in software, and to understand brain algorithms. They overlap, but they are still distinct, and appeal to different groups of people. Neuroscientists want to understand detailed brain operations. They are primarily interested in the brain itself. AI researchers want to understand intelligence, they are primarily interested in higher brain functions (e.g. reasoning, attention, short/long memory, emotions, motivations, goal setting, etc).

We can't (fully) recreate the brain in software partly because we don't know enough, and partly because it's too computationally complex - for example, we can't simulate an entire modern CPU at the transistor level - even though we know how each transistor works, and what each transistor does in the CPU - because each transistor requires a detailed physical model with hundreds of parameters. It's simply not computationally feasible using current supercomputers. Brain is even less feasible to simulate if we want to accurately simulate each individual neuron in it - even if we knew exactly how it works.

But the second goal is much more feasible, and we have made great progress simply by scaling up simple known algorithms which approximate some information processing functions in the brain (mainly pattern matching/prediction and attention). I can talk to GPT4 today just like I talk to other humans, and by the way, this is only possible because out of all AI/ML algorithms people have tried over the last 70 years, the most brain-like one have won (ANNs). If we want to make further progress in AI or if we want to make GPT5 to be more human-like (not sure we do), we don't necessarily need to simulate brain at a neuronal level, we simply need to understand a little bit more about higher level brain functions. Today, we (ML researchers) might actually benefit more from studying psychology than neuroscience.


> Many organisms have just a handful of neurons yet exhibit complex behavior that would be impossible given the weighted connections model.

That's rather a bold claim given that artificial neural networks are universal function approximators.


Impossible given that number of neurons.

It's perhaps not terribly surprising that it becomes possible with unlimited width or depth (or an arbitrarily complex activation function).

https://en.wikipedia.org/wiki/Universal_approximation_theore...


It's incredible to me how widely this is misunderstood.

The universal function approximator theorem only applies for continuous functions. Non-continuous functions can only be approximated to the extent that they are of the same "class" as the activation function.

Additionally, the theorem only proves that for any given continuous function, there exists a particular NN with particular weight that can approximate that function to a given precision. Training is not necessarily possible, and the same NN isn't guaranteed to approximate any other function to some desired precision.

It seems pretty obvious to me that most interesting behaviors in the real world can't be modelled by a mathematical function at all (that is, for each input having a single output); if we further restrict to continuous functions, or step functions, or whatever restriction we get from our chosen activation function.


> The universal function approximator theorem only applies for continuous functions. Non-continuous functions can only be approximated to the extent that they are of the same "class" as the activation function.

Yes, and?

> Training is not necessarily possible

That would be surprising, do you have any examples?

> and the same NN isn't guaranteed to approximate any other function to some desired precision.

Well duh. Me speaking English doesn't mean I can tell 你好[0] from 泥壕[1] when spoken.

> It seems pretty obvious to me that most interesting behaviours in the real world can't be modelled by a mathematical function at all (that is, for each input having a single output)

I think all of physics would disagree with you there, what with it being built up from functions where each input has a single output. Even Heisenberg uncertainty and quantised results from the Stern-Gerlach setup can be modelled that way in silico to high correspondence with reality, despite the result of testing the Bell inequality meaning there can't be a hidden variable.

[0] Nǐ hǎo, meaning "hello"

[1] Ní háo, which google says is "mud trench", but I wouldn't know


> Yes, and?

It means that there is no guarantee that, given a non-continuous function function f(x), there exists an NN that approximates it over its entire domain withing some precision p.

> That would be surprising, do you have any examples?

Do you know of a universal algorithm that can take a continuous function and a target precision, and return an NN architecture (number of layers, number of neurons per layer) and a starting set of weights for an NN, and a training set, such that training the NN will reach the final state?

All I'm claiming is that there is no known algorithm of this kind, and also that the existence of such an algorithm is not guaranteed by any known theorem.

> Well duh. Me speaking English doesn't mean I can tell 你好[0] from 泥壕[1] when spoken.

My point was relevant because we are discussing whether an NN might be equivalent to the human brain, and using the Universal Approximation Theorem to try to decide this. So what I'm saying is that even if "knowning English" were a continuous function and "knowing French" were a continuous function, so by the theorem we know there are NNs that can approximate either one, there is no guarantee that there exists a single NN which can approximate both. There might or might not be one, but the theorem doesn't promise one must exist.

> I think all of physics would disagree with you there, what with it being built up from functions where each input has a single output.

It is built up of them, but there doesn't exist a single function that represents all of physics. You have different functions for different parts of physics. I'm not saying it's not possible a single function could be defined, but I also don't think it's proven that all of physics could be represented by a single function.


> It means that there is no guarantee that, given a non-continuous function function f(x), there exists an NN that approximates it over its entire domain withing some precision p.

And why is this important?

> Do you know of a universal algorithm that can take a continuous function and a target precision, and return an NN architecture (number of layers, number of neurons per layer) and a starting set of weights for an NN, and a training set, such that training the NN will reach the final state?

> All I'm claiming is that there is no known algorithm of this kind, and also that the existence of such an algorithm is not guaranteed by any known theorem.

I think so: the construction proof of the claim that they are universal function approximators seems to meet those requirements.

Even better: it just goes direct to giving you the weights and biases.

> My point was relevant because we are discussing whether an NN might be equivalent to the human brain, and using the Universal Approximation Theorem to try to decide this. So what I'm saying is that even if "knowning English" were a continuous function and "knowing French" were a continuous function, so by the theorem we know there are NNs that can approximate either one, there is no guarantee that there exists a single NN which can approximate both. There might or might not be one, but the theorem doesn't promise one must exist.

I still don't understand your point. It still doesn't seem to matter?

If any organic brain can't do $thing, surely it makes no difference either way whether or not that $thing can or can't be done by whatever function is used by an ANN?

> It is built up of them, but there doesn't exist a single function that represents all of physics. You have different functions for different parts of physics. I'm not saying it's not possible a single function could be defined, but I also don't think it's proven that all of physics could be represented by a single function.

I could point you to this: https://www.youtube.com/watch?v=PHiyQID7SBs

But that would be unfair, given the QM/GR incompatibility.

That said, ultimately I think the onus is on you to demonstrate that it can't be done when all the (known) parts not only already exist separately in such a form, but also, AFAICT, we don't even have a way to describe any possible alternative that wouldn't be made of functions.


> And why is this important?

Since we know non-continuous functions are used in describing various physical phenomena, it opens the gate to the possibility that there are physical phenomena that NNs might not be able to learn.

And while piece-wise continuous functions may still be ok, fully discontinuous functions are much harder.

> I think so: the construction proof of the claim that they are universal function approximators seems to meet those requirements.

Oops, you're right, I was too generous. If we know the function, we can easily create the NN, no learning step needed.

The actual challenge I had in mind was to construct an NN for a function which we do not know, but can only sample, such as the "understand English" function. Since we don't know the exact function, we can't use the method from the proof to even construct the network architecture (since we don't know ahead of time how many bumps there are are, we don't know how many hidden neurons to add).

And note that this is an extremely important limitation. After all, if the UAF was good enough, we wouldn't need DL or different network architectures for different domains at all: a single hidden layer is all you need to approximate any continuous function, right?

> If any organic brain can't do $thing, surely it makes no difference either way whether or not that $thing can or can't be done by whatever function is used by an ANN?

Organic brains can obviously learn both English and French. Arguably GPT-4 can too, so maybe this is not the best example.

But the general doubt remains: we know humans express knowledge in a way that doesn't seem contingent upon that knowledge being a single continuous mathematical function. Since the universal function approximator theorem only proves that for each continuous function there exists an NN which approximates it, this theorem doesn't prove that NNs are equivalent to human brains, even in principle.

> That said, ultimately I think the onus is on you to demonstrate that it can't be done when all the (known) parts not only already exist separately in such a form, but also, AFAICT, we don't even have a way to describe any possible alternative that wouldn't be made of functions.

The way physical theories are normally defined is as a set of equations that model a particular process. QM has the Schrodinger equation or its more advanced forms. Classical mechanics has Newton's laws of motion. GR has the Einstein equations. Fluid dynamics has the Navier-Stokes equations. Each of these is defined in terms of mathematical functions: but they are different functions. And yet many humans know all of them.

As we established earlier, the UFA theorem proves that some NN can approximate one function. For 5 functions you can use 5 NNs. But you can't necessarily always combine these into a single NN that can approximate all 5 functions at once. It's trivial if they are simply 5 easily distinguishable inputs which you can combine into a single 5-input function, but not as easy if they are harder to distinguish, or if you don't know that you should model them as different inputs ahead of time.

By the way, there is also an example of a pretty well known mathematical object used in physics that is not actually a proper function - the so-called Dirac delta function. It's not hard to approximate this with an NN at all, but it does show that physics is not strictly speaking limited to functions.

Edit to add: I agree with you that the GP is wrong to claim that the behavior exhibited by some organisms is impossible to explain if we assumed that the brain was equivalent to an (artificial) neural network.

I'm only trying to argue that the reverse is also not proven: that we don't have any proof that an ANN must be equivalent to a human/animal brain in computational power.

Overall, my position is that we just don't know to what extent brains and ANNs correspond to each other.


> Lots of graph nodes

Neurons are not connected by a simple graph, there are plenty of neurons which affect all the neurons physically close to them. There are also many components in the body which demonstrably affect brain activity but are not neurons (hormone glands being among the most obvious).

> with weighted connections

Probably, though we don't fully understand how synapses work

> performing distributed computation (mainly hierarchical pattern matching)

This is a description of purpose, not form, so it's irrelevant.

> learning from data by gradually updating weights

We have exactly 0 idea how biological neural nets learn at the moment. What we do know for sure is that a single neuron when alone can adjust its behavior based on previous inputs, so the only thing that is really clear is that individual neurons learn as well, it's not just the synapses with their weights which modifies behavior. Even more, non-neuron cells also learn, as is obvious from the complex behaviors of many single-cell organisms, but also some non-neuron cells in multicellular organisms. So potentially, learning in a human is not completely limited to the brain's neural net, but it could include certain other parts of the body (again, glands come to mind).

> using selective attention (and/or recurrence, and/or convolutional filters).

This is completely unknown.

So no, overall, there is almost no similarity between (artificial) neural nets and brains, at least none profound enough that they wouldn't share with a GPU.


What does this comment add to the discussion?


I dunno. My comment complained about the parent comment not adding positively to the discussion. And gave at least a bit of support for that complaint.

Would you have preferred I emulate your style, and complain while providing no support for my complaint?

Ok.


Being positive is not a requirement of commenting on HN, but you should comment with something that is substantive, so yes I do think you shouldn't have commented at all. Tone policing is cringe.


I don't like tone-policing in general. But when I opened this post the negative comment we're talking about was the top comment. That's makes me much more sympathetic to someone calling out the cynicism.


Exactly what are you doing here then?

But hey I guess I can do this too. How's this? Using cringe as an adjective is cringe.


> But hey I guess I can do this too.

It sucks, doesn't it?


Personally I think DALL-E does better quality, especially at photorealistic stuff.

Here's a few of mine, many photorealistic.

https://www.karmatics.com/stuff/dalle.html


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: