In case you're not aware, AlphaGo's key component is based on the same type of Deepmind system that learned to play dozens of Atari games, to superhuman levels, by watching the pixels, without any programmatic adaptation to the particular Atari game. At least the version of AlphaGo that played in October was far less specialized for Go than Deep Blue was for chess. Demis Hassabis says that next up after this is getting Deepmind to play Go without any programmatic specialization for Go. Your reply would be appropriate if we were talking about Deep Blue, chess, and 1997.
That's incorrect. The features that AlphaGo uses are not pixel level features, but board states - and the architecture between AlphaGo and the Atari network is completely different.
It's still an incredibly achievement - but it's important to be accurate.
For AlphaGo, a "pixel" is a point on the board. It uses essentially the same convolutional neural networks (CNNs) that are in state-of-the-art machine vision systems. But yes, the overall architecture is rather different from the Atari system, due to the integration of that CNN with Monte Carlo Tree Search.
Sorry, you're off base a bit. The Atari system did use a Deep Neural Network / Reinforcement algorithm, but as the original poster was trying to point out, the rules of Go were very much hard coded into AlphaGo. From what this [1] says, multiple DNNs are learning how to traverse Monte Carlo trees of Go games. The reinforcement piece comes in choosing which of the Go players is playing the best games.
While the higher portions do share some similarities with the Atari system, at a basic level this is a machine that was designed and trained to play Go. AlphaGO is 'essentially the same' as the Atari system in the same way that all Neural Networks are 'essentially the same.'
Is this an extremely impressive accomplishment? Yes. However, doesn't seem to qualify as anything close to generalizable.
I didn't say AlphaGo is essentially the same as Deep Q Networks. I said the convolutional neural network part of it is essentially the same. We agree that the integration of that CNN into the rest of the system is very different.
It's best to say that alphago uses neural networks, which are extremely general. The same way planes and cars both use internal combustion engines. ICEs are extremely general. They produce mechanical energy from gas, and are totally uncaring whether you put them into a plane or a car. The body of the plane is necessary, but isn't really the interesting part.
Likewise NNs are uncaring what application you put them into. Give them a different input and a different goal, and they will learn to do that instead. Alphago gave it's NN's control over a monte carlo search tree, and that turned out to be enough to beat Go. They could plug the same AI into a car and it would learn to control that instead.
Note that even without the monte carlo search system, it was able to beat most amateurs, and predict the moves experts would make most of the time.
I'm not sure that's correct. MCTS has well known weaknesses, and isn't even a predictive algorithm. MCTS on it's own couldn't get anywhere near beating the top Go champion, that requires deepminds neural networks.
>> In case you're not aware, AlphaGo's key component is based on the same type of Deepmind system that learned to play dozens of Atari games, to superhuman levels, by watching the pixels, without any programmatic adaptation to the particular Atari game.
The Atari-playing AI watched the pixels indeed, but it was also given a set of actions to choose from and more importantly, a reward representing the change in the game score.
That means it wasn't able to learn the significance of the score on its own, or to generalise from the significance of the changing score in one game, to another.
It also played Atari games, that _have_ scores, so it would have been completely useless in situations where there is no score, or a clear win/loss situation.
AlphaGo is also similarly specialised to play Go. As is machine learning in general: someone has to tell the algorithm what it needs to learn, either through data engineering, or reward functions etc. A general AI would learn what is important on its own, like humans do, so machine learning has not yet shown that it can develop into AGI.
I think you are confusing utility functions with intelligence. All AIs need utility functions. An AI without a utility function would just do nothing. It would have no reason to beat Atari games, because it wouldn't get any reward for doing so.
Even humans have utility functions. For example, we get rewards for having sex, or eating food, or just making social relationships with other humans. Or we have negative reinforcement from pain, and getting hurt, or getting rejected socially.
You can come up with more complicated utility functions. Like instead of beating the game, it's goal could be to explore as much of the game as possible. To discover novel things in the game. Kind of like a sense of boredom or novelty that humans have. But in the end it's still just a utility function, it doesn't change how the algorithm itself works to achieve it. AGI is entirely agnostic to the utility function.
>> I think you are confusing utility functions with intelligence.
No, what I'm really saying is that you can't have an autonomous agent that needs to be told what to do all the time. In machine learning, we train algorithms by giving them examples of what we want them to learn, so basically we tell them what to learn. And if we want them to learn something new, we have to train them again, on new data.
Well, that's not conducive to autonomous or "general" intelligence. There may be any number of tasks that your "general" AI will need to perform competently at. What's it gonna do? Come back to you and cry every time it fails at something? So then you have a perpetual child AI that will never stand on its own two feet as an adult, because there's always something new for it to learn. Happy little AI, for sure, but not very useful and not very "general". Except for a general nuisance, maybe.
Edit: I'm saying that machine learning can't possibly lead to general AI, because it's crap at learning useful things on its own.
Machine learning doesn't "need to be told what to do all the time". No one told alphaGo what strategies were the best. It figured that out on it's own, by playing against itself.
There is also unsupervised and semi-supervised learning, which can take advantage of unlabelled data. Even supervised learning can work really well on weakly labelled data. E.g. taking pictures from the internet and using the words that occur next to them as labels. As opposed to hiring a person to manually label all of them.
I don't know what situation you are imagining that would make the AI "come back and cry". You will need to give an example.
>> Machine learning doesn't "need to be told what to do all the time". No one told
alphaGo what strategies were the best.
Of course they did. They trained it with examples of Go games and they also programmed it with a reward function that led it to select the winning games. Otherwise, it wouldn't have learned anything useful.
>> There is also unsupervised and semi-supervised learning, which can take
advantage of unlabelled data.
Sure, but unsupervised learning is useless for learning specific behaviours. You use it for
feature discovery and data exploration. As to semi-supervised learning, it's
"semi" supervised: it learns its own features, then you train it with labels so
that it learns a mapping from those features it discovered to the classes you
want it to output.
>> I don't know what situation you are imagining that would make the AI "come back and cry"
>Of course they did. They trained it with examples of Go games and they also programmed it with a reward function that led it to select the winning games. Otherwise, it wouldn't have learned anything useful.
Yes, but it doesn't need to be trained with examples of Go games. It helps a lot, but it isn't 100% necessary. It can learn to play entirely through self play. The atari games were entirely self play.
As for having a reward function for winning games, of course that is necessary. Without a reward function, any AI would cease to function. That's true even of humans. All agents need reward functions. See my original comment.
>That was an instance of humour
Yes I know what humour is lel. I asked you for a specific example where you think this would matter. Where your kind of AI would do better than a reinforcement learning AI.
That's reinforcement learning and it's even more "telling the computer what to do" than teaching it with examples.
Because you're actually telling it what to do to get a reward.
>> Without a reward function, any AI would cease to function.
I can't understand this comment, which you made before. Not all AI has a reward function. Specific algorithms do. "All" AI? Do you mean all game-playing AI? Even that's stretching it, I don't remember minimax being described in terms of rewards say, and I certainly haven't heard any of about a dozen classifiers I've studied and a bunch of other systems of all sorts (not just machine learning) being described in terms of rewards either.
Unless you mean "reward function" as the flip side of a cost function? I suppose you could argue that- but could you please clarify?
>> your kind of AI
Here, there's clearly some misunderstanding because even if I have a "my kind" of AI, I didn't say anything like that.
I'm sorry if I didn't make that clear. I'm not trying to push some specific kind of AI, though of course I have my preferences. I'm saying that machine learning can't lead to AGI, because of reasons I detailed above.
>That's reinforcement learning and it's even more "telling the computer what to do" than teaching it with examples.
No one tells the computer what to do. They just let it do it's thing, and give it a reward when it succeeds.
>Not all AI has a reward function. Specific algorithms do. "All" AI?
Fine, all general AI. Like game playing etc. Minimax isn't general, and it does require a precise "value function" to tell it how valuable each state is. Classification also isn't general, but it also requires precise loss function.
Sure they do. Say you have a machine learning algorithm, that can learn a task
from examples, and let's notate it like so:
y = f(x)
Where y is the trained system, f the learning function and x the training
examples.
The "x", the training examples, is what tells the computer what to learn,
therefore, what to do once it's trained. If you change the x, the learner can do
a different y. Therefore, you're telling the computer what to do.
In fact, once you train a computer for a different y, it may or may not be
really good at it, but it certainly can't do the old y anymore. Which is what I
mean by "machine learning can't lead to AGI". Because machine learning
algorithms are really bad at generalising from one domain to another, and the ability to do so
is necessary for general intelligence.
Edit: note that the above has nothing to do with supervised vs unsupervised etc. The point is that you train the algorithm on examples, and that necessarily removes any possibility of autonomy.
>> Fine, all general AI. Like game playing etc.
I'm still not clear what you're saying; game-playing AI is not an instance of
general AI. Do you mean "general game-playing AI"? That too doesn't always
necessarily have a reward function. If I remember correctly for instance, Deep
Blue did not use reinforcement learning and Watson certainly does not (I got access to the
Watson papers, so I could double-check if you doubt this).
Btw, every game-playing AI requires a precise evaluation function. The
difference with machine-learned game-playing AI is that this evaluation function
is sometimes learned by the learner, rather than hard-coded by the programmer.
The thing about neural networks is they can generalize from one domain to another. We don' have a million different algorithms, one for recognizing cars, and another for recognizing dogs, etc. They learn features that both have in common.
>The "x", the training examples, is what tells the computer what to learn, therefore, what to do once it's trained. If you change the x, the learner can do a different y. Therefore, you're telling the computer what to do.
But with RL, a computer can discover it's own training examples from experience. They don't need to be given to it.
>I'm still not clear what you're saying; game-playing AI is not an instance of general AI.
But it is! The distinction between the real world and a game is arbitrary. If an algorithm can learn to play a random video game, you can just as easily plug it into a robot and let it play "real life". The world is more complicated, of course, but not qualitatively different.