This is a very misleading title. The result here is not that TM's are recurrent neural networks, it is that it is possible to construct an (infinite) recurrent neural network that emulates a TM. But the fact that a TM can be built out of perceptrons is neither surprising nor interesting. It's pretty obvious that you can build a NAND gate out of perceptrons, and so of course you can build a TM out them.
In fact, it's probably the case that you can build a NAND gate (and hence a TM) out of any non-linear transfer function. I'd be surprised if this is not a known result one way or the other.
I don't remember the name of the theorem, but you can approximate any nonlinear multivariable function arbitrarily with a multi-layer perceptron with any non-polynomial nonlinear function, applied after the linear weights and bias. It has to be non-polynomial because the set of all polynomials is closed under linear combinations, adding a constant, and composition, so if the nonlinear function was (say) x^3 you would only get polynomials out of the model.
I'm not sure why that's a problem because polynomial approximations are still useful.
For one, only continuous functions can be represented.
Much more importantly, the theorem doesn't prove that it's possible to learn the necessary weights to approximate any function, just that such weights much exist.
With our current methods, only a subset of all possible NNs are actually trainable, so we can only automate the construcion approximations for certain continuous functions (generally those that are differentiable, but there may be exceptions, I'm not as sure).
If we're talking about approximation, continuous functions can converge to step functions just fine. Take a regular sigmoid and keep raising the weight to see one example. That's a good point about training though, that theorem doesn't fully explain why NNs work, although it somewhat sounds like it does.
There are discontinuous functions which are not steps - one of the more useful ones being tan(x) for any real x. Of course, since tan(x) is piece wise continuous and periodic, it is probably easy to work around in practice.
The basic approximation theorem you might be thinking of is known as Kolmagorov's Theorem (dude got around). It's an early theorem, from 1957, that's about the universality of functions of a linear combination of single variable function.
But all the other universality theorems refer back to it and don't have their own names, for example; Optimal approximation of continuous functions by very deep ReLU networks by Dmitry Yarotsky [1]. The reference for the original theorem would be On The Structure Of Continuous Functions Of Several Variable, David A. Sprecker [2],
This is the key part.
"Turing completeness" is just a way of reasoning about infinities. (Anything that can recurse or loop infinitely is eventually Turing-complete.)
As a side note, it's fascinating to read the comments on that thread when talking about RNNs and Deep Learning. So much has changed in the last 6 years and feels so weird to read dismissive comments about the capabilities of these systems seeing what people are getting out of ChatGPT.
« Inflated » expectations doesn't mean NO expectations...
People are still throwing « AI » around as a buzzword like it's something distinct from a computer program, in fact the situation got worse because non-neural network programs are somehow dismissed as « not AI » now.
Autonomous cars are still nowhere to be seen, even more so for « General « AI » ».
The singularity isn't much on track either looking at Kurzweil's predictions : we should have had molecular manufacturers by now, and nanobots connecting our brains to the Internet, extending our lives, and brain scanning people to recreate their avatars when they are dead, don't seem like they are going to happen by the 2030s either. (2045 is still far enough away that I wouldn't completely bet against a singularity by then.)
(And Kurzweil doesn't get to blame it on misunderstanding the exponential function : how people have a too linear view of the future and tend to overestimate changes in the near future, but underestimate them in the long term !)
ChatGPT hasn't overcome any of the fundamental issues, it's just a huge improvement on the things that the original GPTs were good at. Being able to stay coherent for a trained-in length that gets longer with larger models is different from the length-unlimited coherence that human beings can manage, spanning lifetimes of thought and multiple lifetimes of discourse.
> ChatGPT hasn't overcome any of the fundamental issues, it's just a huge improvement on the things that the original GPTs were good at. Being able to stay coherent for a trained-in length that gets longer with larger models is different from being internally coherent for 90 years by nature like people are.
People are very often not internally coherent over periods much shorter than 90 years.
The hazily defined ideal of general intelligence - which everyone imagines that they are, but most of us do not live up to all the time, and nobody lives up to every waking moment (or at least before we've had our morning cup of coffee) isn't within the reach of present day transformer architectures because the length of text they can stay coherent over is limited by how far back they can look through their own output. Human beings can form new memories from their outputs that stay with them for the rest of their lives.
>length-unlimited coherence that human beings can manage
I've yet to meet a human that doesn't eventually run out of steam. You just aren't well situated to notice decoherence on time scales longer than your own coherence.
At what point have our neural networks crossed over to demonstrating algorithmic behavior and we no longer consider them fancy interpolating functions? Is there a way to quantify this?
See Moravec's paradox [1], which has formulated decades ago. The more we build these models the more it's starting to seem like the paradox is basically true: that the hard part of intelligence isn't higher reasoning but the basic perception and sensorimotor control that animals have been evolving for hundreds of millions of years before homo sapiens came on the scene.
People in the 17th century viewed themselves as fancy clockwork machines, because of course mechanical clocks were the peak of high-tech hype at the time.
(Really what we should be thinking about is information complexity, not facile analogies.)
And in theological arguments back then, god was called "the great watchmaker". Now the question "are we living in a computer simulation?" keeps popping here and there. I wonder what the analogy will be in 400 years.
There's no way a neural network could ever learn a hash function directly (unless it had every possible input and output in its table), and if there was an indirect way to train it, you'd discover that it was interpolating between (say) possible hash functions by working in a larger space, for example if it was trained to generate and test C programs that computed hashes.
They are computationally equivalent, but I wouldn't say that one of them "is" another, since RNN has a limited amount of memory, and Turing machine has an infinite tape.
In fact, it's probably the case that you can build a NAND gate (and hence a TM) out of any non-linear transfer function. I'd be surprised if this is not a known result one way or the other.