No offence, but it's hard to compare learning a colour histogram to learning con...

ModernMech · on May 11, 2017

> it's hard to compare learning a colour histogram to learning concepts such as couch, floor, and open door

The output of most machine learning algorithms is just a belief function. Deep learning is nice, because it basically removes the need to manually choose features, which can be the hardest part of applying machine learning to solve a classification task. But the output is still a belief function.

Machine learning as we generally know it today isn't about making a computer understand "concepts" or anything higher order like that. I think it is easy to compare learning a color histogram to learning classifications (e.g. couch, floor) because the two algorithms do exactly the same task in different ways.

The parent is saying that the function the robot needs to learn is linear. It doesn't matter that it's a drone, and the deep learning apparatus in the middle is overkill, because learning linear functions is easy (you don't need much data to figure out which way is up on a line)

alexleegk · on May 12, 2017

You could also use deep features (pre-trained for ImageNet classification) and use them in your Q-function approximator in such a way that the Q-function is linear wrt some high-level features. Then, you get the best of both: being able to process complex visual input while being able to do reinforcement learning with very few training trajectories. See [1] for an example (in simulation).

[1] http://rll.berkeley.edu/visual_servoing/

eli_gottlieb · on May 12, 2017

>The output of most machine learning algorithms is just a belief function.

It's not even a belief function, in the sense of a normalized probability distribution that respects conditionalization properly. It's basically just a one-hot vector for classification.

clickok · on May 12, 2017

I'm not sure what I'm supposed to be offended by; in any event I was not talking about color histograms, but instead using "some OpenCV filters to extract colors and textures" and testing a linear model first before reaching for the big guns.

Sure, it's different from human-like perception, if that's really what the deep net is learning to do. But the burden of confirming that it's learning "concepts" instead of, say, dedicating a million parameters to implementing Sobel filters or wavelet transformations, or something even more trivial like "if all my pixels are one color, I am probably near an obstacle" is not on me[0].

When I approach a deep learning problem, my default assumption is that the model is out to humiliate me by learning something entirely trivial, and so I go to great lengths to augment my dataset and validate the fact that I got some extra mileage out of spinning up the ol' GPU that wouldn't have been possible (or at least, not as easy) with simpler methods. Because if you can use something simple, why not do it[1]?

For robots, it's maybe a full page of code to try some quick image filters, flatten them, and implement SARSA or Q(λ). Our demo used Pavlovian control (basically, TD methods to estimate the likelihood of running into a wall, and turning if a collision seems too probable). You can run it on a Raspberry Pi in real time, no GPU required, including the robot it costs less than $300, and it doubles on sax. When I'm done with my current project I'd like to try it with a drone, because aerial demolition derbies sound like the next great spectator sport[3].

----

0 . There are techniques you can employ, for example: examining individual units or clusters of them for their response to different frames, deconvolution, and of course, messing with the inputs. But this is rarely done, because it takes time away from using the magic hammer of DNNs to nail yet another previously difficult problem. This is understandable, but it makes me wish I had the time to develop some tools for performing quick and easy trepanation on deep models so that examining the representation becomes as easy as the training part.

1. My colleague Marlos Machado has written a paper that seems relevant to this sorta thing: https://arxiv.org/abs/1512.01563 . By looking at what Deep Mind's Atari DQN was doing (or what it seems to be doing) and developing the analogous linear features, you can get performance that is almost as good with a model that ingests data many times faster. That is, their median score was better than or equivalent to Deep Mind's. When it comes to RL, if you can use linear methods it's a huge help-- you know you're going to converge[2], probably quite quickly.

2. Subject to technical conditions, e.g. you're dealing with a stationary ergodic MDP, Robbins-Munro stepsizes, the environment's oblivious, the algorithm's on-policy, no purchase necessary, see in store for details etc.

3. It might not be as good a representation, but maybe a faster reaction time is more important? And also because it could be that quadcopters are an entirely different kind of beast and really do need some of that old time deep learning religion, and I don't want to fall into the trap of thinking that the problem is simple when it's not. With these sorts of questions you either have to do the experiment yourself or pay a guy from Google to tell you what they've done (and then be prepared to litigate).