No I think these comments are quite necessary. People need to stop making these comparisons because they have absolutely no grounding in how brains actually work. There are bad ideas that should be dismissed.
Neural networks are absolutely based on a very simplified model of how brains work. Specific NN architectures are in turn based on specific parts of the brain (e.g. Convolution Neural Networks are based on the visual cortices of cats/frogs).
nah, they're arbitrary function approximators that caught a lucky break. CNNs rose to prominence because natural scene statistics are translation invariant and convolutions can be efficiently computed on GPUs. and now that we have whole warehouses of GPUs, the current mood in DL is to stop building the symmetries of your dataset into the model (which is insane btw) and use brute force.
the tenuous connection DL once had to neuroscience (perceptrons) is a distant memory
If you want to talk about history, these things were invented using a 1950's understanding of neuroscience then promptly discarded until the ml people figured out how to make them useful.
Why do you say that? Deep Learning was accelerating well before that (I would argue it has been accelerating for its entire existence).
AlexNet was a state-of-the-art image recognition net for a (relatively) brief amount of time. It wasn't the first CNN to use GPU acceleration, and it was quickly eclipsed in terms of ImageNet performance.
Regardless, I think bringing up AlexNet kinda invalidates your initial point. Although yes, it turns out that the two were a great match, CNNs and modern GPUs were clearly developed independently of each other, as evidenced by the many, many iterations of both before they were combined.
is this schmidhuber's alt? sure they existed before AlexNet was where it really took off. just look at the number of citations. right paper, right time. CNNs were uniquely suited to the hardware at the time. because of their efficiency due to symmetry and suitability to GPGPU computing. not because of their history.
You're saying the study has no grounding in how brains work? I'd think a more reasonable conclusion would be that the neuroscientists involved have no grounding in how artificial neural networks work.
It seems the whole point is to bring in additional details of how brains work, that the think may be relevant to artificial NNs.
Artificial neural networks are the closest working model of a brain we have today.
Lots of graph nodes, with weighted connections, performing distributed computation (mainly hierarchical pattern matching), learning from data by gradually updating weights, using selective attention (and/or recurrence, and/or convolutional filters).
Which of the above is not happening in our brains? Which of the above is not biologically inspired?
In fact this description equally applies to both a brain and GPT4.
Many organisms have just a handful of neurons yet exhibit complex behavior that would be impossible given the weighted connections model. Not to mention single-celled organisms that exhibit ability to navigate.
The model can be the closest working model but that doesn't mean it is complete. It's very likely that cells can store memories/information independent from weights.
We can’t do that not because our mathematical neurons are too simple. We can’t do that because we don’t know the algorithms those biological neurons are running.
There are two separate goals: to simulate the brain in software, and to understand brain algorithms. They overlap, but they are still distinct, and appeal to different groups of people. Neuroscientists want to understand detailed brain operations. They are primarily interested in the brain itself. AI researchers want to understand intelligence, they are primarily interested in higher brain functions (e.g. reasoning, attention, short/long memory, emotions, motivations, goal setting, etc).
We can't (fully) recreate the brain in software partly because we don't know enough, and partly because it's too computationally complex - for example, we can't simulate an entire modern CPU at the transistor level - even though we know how each transistor works, and what each transistor does in the CPU - because each transistor requires a detailed physical model with hundreds of parameters. It's simply not computationally feasible using current supercomputers. Brain is even less feasible to simulate if we want to accurately simulate each individual neuron in it - even if we knew exactly how it works.
But the second goal is much more feasible, and we have made great progress simply by scaling up simple known algorithms which approximate some information processing functions in the brain (mainly pattern matching/prediction and attention). I can talk to GPT4 today just like I talk to other humans, and by the way, this is only possible because out of all AI/ML algorithms people have tried over the last 70 years, the most brain-like one have won (ANNs). If we want to make further progress in AI or if we want to make GPT5 to be more human-like (not sure we do), we don't necessarily need to simulate brain at a neuronal level, we simply need to understand a little bit more about higher level brain functions. Today, we (ML researchers) might actually benefit more from studying psychology than neuroscience.
It's incredible to me how widely this is misunderstood.
The universal function approximator theorem only applies for continuous functions. Non-continuous functions can only be approximated to the extent that they are of the same "class" as the activation function.
Additionally, the theorem only proves that for any given continuous function, there exists a particular NN with particular weight that can approximate that function to a given precision. Training is not necessarily possible, and the same NN isn't guaranteed to approximate any other function to some desired precision.
It seems pretty obvious to me that most interesting behaviors in the real world can't be modelled by a mathematical function at all (that is, for each input having a single output); if we further restrict to continuous functions, or step functions, or whatever restriction we get from our chosen activation function.
> The universal function approximator theorem only applies for continuous functions. Non-continuous functions can only be approximated to the extent that they are of the same "class" as the activation function.
Yes, and?
> Training is not necessarily possible
That would be surprising, do you have any examples?
> and the same NN isn't guaranteed to approximate any other function to some desired precision.
Well duh. Me speaking English doesn't mean I can tell 你好[0] from 泥壕[1] when spoken.
> It seems pretty obvious to me that most interesting behaviours in the real world can't be modelled by a mathematical function at all (that is, for each input having a single output)
I think all of physics would disagree with you there, what with it being built up from functions where each input has a single output. Even Heisenberg uncertainty and quantised results from the Stern-Gerlach setup can be modelled that way in silico to high correspondence with reality, despite the result of testing the Bell inequality meaning there can't be a hidden variable.
[0] Nǐ hǎo, meaning "hello"
[1] Ní háo, which google says is "mud trench", but I wouldn't know
It means that there is no guarantee that, given a non-continuous function function f(x), there exists an NN that approximates it over its entire domain withing some precision p.
> That would be surprising, do you have any examples?
Do you know of a universal algorithm that can take a continuous function and a target precision, and return an NN architecture (number of layers, number of neurons per layer) and a starting set of weights for an NN, and a training set, such that training the NN will reach the final state?
All I'm claiming is that there is no known algorithm of this kind, and also that the existence of such an algorithm is not guaranteed by any known theorem.
> Well duh. Me speaking English doesn't mean I can tell 你好[0] from 泥壕[1] when spoken.
My point was relevant because we are discussing whether an NN might be equivalent to the human brain, and using the Universal Approximation Theorem to try to decide this. So what I'm saying is that even if "knowning English" were a continuous function and "knowing French" were a continuous function, so by the theorem we know there are NNs that can approximate either one, there is no guarantee that there exists a single NN which can approximate both. There might or might not be one, but the theorem doesn't promise one must exist.
> I think all of physics would disagree with you there, what with it being built up from functions where each input has a single output.
It is built up of them, but there doesn't exist a single function that represents all of physics. You have different functions for different parts of physics. I'm not saying it's not possible a single function could be defined, but I also don't think it's proven that all of physics could be represented by a single function.
> It means that there is no guarantee that, given a non-continuous function function f(x), there exists an NN that approximates it over its entire domain withing some precision p.
And why is this important?
> Do you know of a universal algorithm that can take a continuous function and a target precision, and return an NN architecture (number of layers, number of neurons per layer) and a starting set of weights for an NN, and a training set, such that training the NN will reach the final state?
> All I'm claiming is that there is no known algorithm of this kind, and also that the existence of such an algorithm is not guaranteed by any known theorem.
I think so: the construction proof of the claim that they are universal function approximators seems to meet those requirements.
Even better: it just goes direct to giving you the weights and biases.
> My point was relevant because we are discussing whether an NN might be equivalent to the human brain, and using the Universal Approximation Theorem to try to decide this. So what I'm saying is that even if "knowning English" were a continuous function and "knowing French" were a continuous function, so by the theorem we know there are NNs that can approximate either one, there is no guarantee that there exists a single NN which can approximate both. There might or might not be one, but the theorem doesn't promise one must exist.
I still don't understand your point. It still doesn't seem to matter?
If any organic brain can't do $thing, surely it makes no difference either way whether or not that $thing can or can't be done by whatever function is used by an ANN?
> It is built up of them, but there doesn't exist a single function that represents all of physics. You have different functions for different parts of physics. I'm not saying it's not possible a single function could be defined, but I also don't think it's proven that all of physics could be represented by a single function.
But that would be unfair, given the QM/GR incompatibility.
That said, ultimately I think the onus is on you to demonstrate that it can't be done when all the (known) parts not only already exist separately in such a form, but also, AFAICT, we don't even have a way to describe any possible alternative that wouldn't be made of functions.
Since we know non-continuous functions are used in describing various physical phenomena, it opens the gate to the possibility that there are physical phenomena that NNs might not be able to learn.
And while piece-wise continuous functions may still be ok, fully discontinuous functions are much harder.
> I think so: the construction proof of the claim that they are universal function approximators seems to meet those requirements.
Oops, you're right, I was too generous. If we know the function, we can easily create the NN, no learning step needed.
The actual challenge I had in mind was to construct an NN for a function which we do not know, but can only sample, such as the "understand English" function. Since we don't know the exact function, we can't use the method from the proof to even construct the network architecture (since we don't know ahead of time how many bumps there are are, we don't know how many hidden neurons to add).
And note that this is an extremely important limitation. After all, if the UAF was good enough, we wouldn't need DL or different network architectures for different domains at all: a single hidden layer is all you need to approximate any continuous function, right?
> If any organic brain can't do $thing, surely it makes no difference either way whether or not that $thing can or can't be done by whatever function is used by an ANN?
Organic brains can obviously learn both English and French. Arguably GPT-4 can too, so maybe this is not the best example.
But the general doubt remains: we know humans express knowledge in a way that doesn't seem contingent upon that knowledge being a single continuous mathematical function. Since the universal function approximator theorem only proves that for each continuous function there exists an NN which approximates it, this theorem doesn't prove that NNs are equivalent to human brains, even in principle.
> That said, ultimately I think the onus is on you to demonstrate that it can't be done when all the (known) parts not only already exist separately in such a form, but also, AFAICT, we don't even have a way to describe any possible alternative that wouldn't be made of functions.
The way physical theories are normally defined is as a set of equations that model a particular process. QM has the Schrodinger equation or its more advanced forms. Classical mechanics has Newton's laws of motion. GR has the Einstein equations. Fluid dynamics has the Navier-Stokes equations. Each of these is defined in terms of mathematical functions: but they are different functions. And yet many humans know all of them.
As we established earlier, the UFA theorem proves that some NN can approximate one function. For 5 functions you can use 5 NNs. But you can't necessarily always combine these into a single NN that can approximate all 5 functions at once. It's trivial if they are simply 5 easily distinguishable inputs which you can combine into a single 5-input function, but not as easy if they are harder to distinguish, or if you don't know that you should model them as different inputs ahead of time.
By the way, there is also an example of a pretty well known mathematical object used in physics that is not actually a proper function - the so-called Dirac delta function. It's not hard to approximate this with an NN at all, but it does show that physics is not strictly speaking limited to functions.
Edit to add: I agree with you that the GP is wrong to claim that the behavior exhibited by some organisms is impossible to explain if we assumed that the brain was equivalent to an (artificial) neural network.
I'm only trying to argue that the reverse is also not proven: that we don't have any proof that an ANN must be equivalent to a human/animal brain in computational power.
Overall, my position is that we just don't know to what extent brains and ANNs correspond to each other.
Neurons are not connected by a simple graph, there are plenty of neurons which affect all the neurons physically close to them. There are also many components in the body which demonstrably affect brain activity but are not neurons (hormone glands being among the most obvious).
> with weighted connections
Probably, though we don't fully understand how synapses work
This is a description of purpose, not form, so it's irrelevant.
> learning from data by gradually updating weights
We have exactly 0 idea how biological neural nets learn at the moment. What we do know for sure is that a single neuron when alone can adjust its behavior based on previous inputs, so the only thing that is really clear is that individual neurons learn as well, it's not just the synapses with their weights which modifies behavior. Even more, non-neuron cells also learn, as is obvious from the complex behaviors of many single-cell organisms, but also some non-neuron cells in multicellular organisms. So potentially, learning in a human is not completely limited to the brain's neural net, but it could include certain other parts of the body (again, glands come to mind).
> using selective attention (and/or recurrence, and/or convolutional filters).
This is completely unknown.
So no, overall, there is almost no similarity between (artificial) neural nets and brains, at least none profound enough that they wouldn't share with a GPU.