Hacker News new | past | comments | ask | show | jobs | submit login
Turing Machines Are Recurrent Neural Networks (1996) (aalto.fi)
97 points by todsacerdoti on Dec 5, 2022 | hide | past | favorite | 32 comments



This is a very misleading title. The result here is not that TM's are recurrent neural networks, it is that it is possible to construct an (infinite) recurrent neural network that emulates a TM. But the fact that a TM can be built out of perceptrons is neither surprising nor interesting. It's pretty obvious that you can build a NAND gate out of perceptrons, and so of course you can build a TM out them.

In fact, it's probably the case that you can build a NAND gate (and hence a TM) out of any non-linear transfer function. I'd be surprised if this is not a known result one way or the other.


I don't remember the name of the theorem, but you can approximate any nonlinear multivariable function arbitrarily with a multi-layer perceptron with any non-polynomial nonlinear function, applied after the linear weights and bias. It has to be non-polynomial because the set of all polynomials is closed under linear combinations, adding a constant, and composition, so if the nonlinear function was (say) x^3 you would only get polynomials out of the model.

I'm not sure why that's a problem because polynomial approximations are still useful.


Note that there are two caveats.

For one, only continuous functions can be represented.

Much more importantly, the theorem doesn't prove that it's possible to learn the necessary weights to approximate any function, just that such weights much exist.

With our current methods, only a subset of all possible NNs are actually trainable, so we can only automate the construcion approximations for certain continuous functions (generally those that are differentiable, but there may be exceptions, I'm not as sure).


If we're talking about approximation, continuous functions can converge to step functions just fine. Take a regular sigmoid and keep raising the weight to see one example. That's a good point about training though, that theorem doesn't fully explain why NNs work, although it somewhat sounds like it does.


There are discontinuous functions which are not steps - one of the more useful ones being tan(x) for any real x. Of course, since tan(x) is piece wise continuous and periodic, it is probably easy to work around in practice.


Their derivatives, f^n(x) for arbitrary n, do not converge fine. Indeed, even n=1 is usually terrible in practice.

Fit a pleasant day-trip velocity curve with a NN and the acceleration would kill you.


Wouldn't that be mostly a consequence of the loss function used to compute the distance between the curve and the model output?


The basic approximation theorem you might be thinking of is known as Kolmagorov's Theorem (dude got around). It's an early theorem, from 1957, that's about the universality of functions of a linear combination of single variable function.

But all the other universality theorems refer back to it and don't have their own names, for example; Optimal approximation of continuous functions by very deep ReLU networks by Dmitry Yarotsky [1]. The reference for the original theorem would be On The Structure Of Continuous Functions Of Several Variable, David A. Sprecker [2],

[1] http://proceedings.mlr.press/v75/yarotsky18a

[2] https://www.ams.org/journals/tran/1965-115-00/S0002-9947-196...



Are you talking about something like this?

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Arnold_repr...


> infinite

This is the key part. "Turing completeness" is just a way of reasoning about infinities. (Anything that can recurse or loop infinitely is eventually Turing-complete.)


It strikes me as the classical version of a well known result in quantum computing: "nearly any gate set is universal for quantum computing".


Related:

Turing Machines Are Recurrent Neural Networks (1996) - https://news.ycombinator.com/item?id=10930559 - Jan 2016 (12 comments)


As a side note, it's fascinating to read the comments on that thread when talking about RNNs and Deep Learning. So much has changed in the last 6 years and feels so weird to read dismissive comments about the capabilities of these systems seeing what people are getting out of ChatGPT.


Not really ?

« Inflated » expectations doesn't mean NO expectations...

People are still throwing « AI » around as a buzzword like it's something distinct from a computer program, in fact the situation got worse because non-neural network programs are somehow dismissed as « not AI » now.

Autonomous cars are still nowhere to be seen, even more so for « General « AI » ».

The singularity isn't much on track either looking at Kurzweil's predictions : we should have had molecular manufacturers by now, and nanobots connecting our brains to the Internet, extending our lives, and brain scanning people to recreate their avatars when they are dead, don't seem like they are going to happen by the 2030s either. (2045 is still far enough away that I wouldn't completely bet against a singularity by then.)

(And Kurzweil doesn't get to blame it on misunderstanding the exponential function : how people have a too linear view of the future and tend to overestimate changes in the near future, but underestimate them in the long term !)


ChatGPT hasn't overcome any of the fundamental issues, it's just a huge improvement on the things that the original GPTs were good at. Being able to stay coherent for a trained-in length that gets longer with larger models is different from the length-unlimited coherence that human beings can manage, spanning lifetimes of thought and multiple lifetimes of discourse.


> ChatGPT hasn't overcome any of the fundamental issues, it's just a huge improvement on the things that the original GPTs were good at. Being able to stay coherent for a trained-in length that gets longer with larger models is different from being internally coherent for 90 years by nature like people are.

People are very often not internally coherent over periods much shorter than 90 years.


The hazily defined ideal of general intelligence - which everyone imagines that they are, but most of us do not live up to all the time, and nobody lives up to every waking moment (or at least before we've had our morning cup of coffee) isn't within the reach of present day transformer architectures because the length of text they can stay coherent over is limited by how far back they can look through their own output. Human beings can form new memories from their outputs that stay with them for the rest of their lives.


In 2016 Transformers didn't exist and the state of the art for neural network based NLP was using LSTMs that had a limit of maybe 100 words at most.

With new implementations like xformers[1] and flash attention[2] it is unclear where the length limit is on modern transformer models.

Flash Attention can currently scale up to 64,000 tokens on an A100.

[1] https://github.com/facebookresearch/xformers/blob/main/HOWTO...

[2] https://github.com/HazyResearch/flash-attention


>length-unlimited coherence that human beings can manage

I've yet to meet a human that doesn't eventually run out of steam. You just aren't well situated to notice decoherence on time scales longer than your own coherence.


At what point have our neural networks crossed over to demonstrating algorithmic behavior and we no longer consider them fancy interpolating functions? Is there a way to quantify this?


Never. Instead, we begin understanding ourselves as fancy interpolating functions.


See Moravec's paradox [1], which has formulated decades ago. The more we build these models the more it's starting to seem like the paradox is basically true: that the hard part of intelligence isn't higher reasoning but the basic perception and sensorimotor control that animals have been evolving for hundreds of millions of years before homo sapiens came on the scene.

[1] https://en.wikipedia.org/wiki/Moravec%27s_paradox


People in the 17th century viewed themselves as fancy clockwork machines, because of course mechanical clocks were the peak of high-tech hype at the time.

(Really what we should be thinking about is information complexity, not facile analogies.)


And in theological arguments back then, god was called "the great watchmaker". Now the question "are we living in a computer simulation?" keeps popping here and there. I wonder what the analogy will be in 400 years.


In 400 years: "back in the day the meatbags thought they were special"


A model with no training data would know nothing, so in a sense they're always going to be something like a fancy form of interpolation/extrapolation.


There’s a difference between interpolation and induction. You’re not going to interpolate a hash function.


There's no way a neural network could ever learn a hash function directly (unless it had every possible input and output in its table), and if there was an indirect way to train it, you'd discover that it was interpolating between (say) possible hash functions by working in a larger space, for example if it was trained to generate and test C programs that computed hashes.


They are computationally equivalent, but I wouldn't say that one of them "is" another, since RNN has a limited amount of memory, and Turing machine has an infinite tape.


Turning Machines Are Magic: The Gathering [1]

[1]https://arxiv.org/abs/1904.09828


Does the new chat AI bot pass the turing test?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: