Generative Teaching Networks: Accelerating Neural Architecture Search

CShorten · on Feb 20, 2020

I made a video explaining this research if you are interested: https://www.youtube.com/watch?v=lmnJfLjDVrI&t=4s

Eug894 · on Feb 20, 2020

Isn't it interesting what is more efficient: neural nets or a learning Mealy machine? Anyway, an optimization of an exhaustive search is a slow but assured way of solving a car driver problem. You don't need the most accurate simulation for it as Elon says here:

https://www.youtube.com/watch?v=Ucp0TTmvqOE&t=7358

A "brute-force" algorithm (an exhaustive search, in other words) is the easiest way to find an answer to almost any engineering problem. But it often must be optimized before being computed. The optimization may be done by an AI agent based on Neural Nets, or on a Learning Mealy Machine.

A Learning Mealy Machine is an finite automaton in which training data stream is remembered by constructing disjunctive normal forms of the output function of the automaton and the transition function between its states. Then those functions are optimized (compressed with losses by logic transformations like De Morgan's Laws, arithmetic rules, loop unrolling/rolling, etc.) into some generalized forms. That introduces random hypotheses into the automaton's functions, so it can be used in inference. The optimizer for automaton's functions may be another AI agent, or any heuristic algorithm, which you like...

Some interesting engineering (and scientific) problems are: - finding a machine code for a controller of a car, which makes it able to drive autonomously; - finding a machine code for a controller of a bipedal robot, which makes it able to work in warehouses and factories; - finding a CAD file, which describes the design of a spheromak working with a guiding center drift generator (hypothetical device, idk!); - finding a CAD file, which describes some kind of working Smoluchowski’s trapdoor (in some specific conditions, of course); - finding a file, which describes an automaton working in accordance to the data of a scientific experiment; - finding a file, which describes manufacturing steps to produce the first molecular nanofactory in the world.

Related work by Embecosm is here: superoptimization.org Though it seems people have superoptimized only tiny programs so far as you can see from the ICRL 2017 paper (App. D): arxiv.org/abs/1611.01787 And loops can also be rolled, not just unrolled. That kind of loop optimization seems to be absent here: en.wikipedia.org/wiki/Loop_optimization

If you have any questions, ask me here: https://www.facebook.com/eugene.zavidovsky

heyitsguay · on Feb 20, 2020

That sounds very different from what Uber is doing here, which is basically accelerating training with synthetic data to accelerate otherwise standard neural architecture search tools. The focus is on the data synthesis network.

Also, the system you describe sounds impractical for any of the complex learning tasks you suggest, especially if it hasn't even done much simpler things yet. Why would machine code be the right level of abstraction for a vision or robotics problem?

Eug894 · on Feb 20, 2020

> Why would machine code be the right level of abstraction for a vision or robotics problem?

That code would be used to calculate the output function and the transition function of the automaton. At first, as the automaton tries some action and receives a reaction, those functions are constructed accordingly in plain movs and cmps with jmps (suppose x86 ISA here). Then a whole machine code of all actions-reactions is optimized by arithmetic rules, loop rolling and unrolling, etc, so its size is reduced. That optimization may include some hypotheses about Don't Care values of the functions too, which will be corrected in future passes, if they turn out to be wrong... Imagine that code running on something like Thomas Sohmers' Neo processor or Sunway SW26010.

Yeah, it is completely different to Neural Nets. I posted it here because I feel the urge to popularize the idea : ) I am a dilettante in machine learning actually.

lettergram · on Feb 20, 2020

For those interested, I also work in this area: https://medium.com/capital-one-tech/why-you-dont-necessarily...

Arguably, this is still a new field — but IMO will eventually become standard practice. Imo you can completely separate humans from data and still do machine learning (likely analytics). This would dramatically limit data breaches if implemented properly.

w_t_payne · on Feb 20, 2020

I really like the idea of optimising the 'direct' training data, and wonder how it would interact with the use of synthetic data as the 'indirect' training data. Or perhaps some sort of restriction on the (optimised) 'direct' training data as a form of regularisation. Lots of potential ideas to explore here.

felipepsuch · on Feb 20, 2020

The generator we use in our paper is a form of restriction of 'direct' training data. You can think of it as a weird encoding for images.

PS: I'm the author of the GTN paper. Feel free to ask any questions

gcucurull · on Feb 20, 2020

Hey, congrats on the paper, I read it a while ago and thought it was really interesting.

I tried implementing it, and the samples generated by the Teacher seem to suffer from mode collapse (as if the generator is ignoring the random vector z but not the label condition). Do you recall having that issue at some point?

I have to say I'm using a simpler generator than the one in the paper, and I'm not changing the learner architechture at each batch, only its weights.

Thanks!

felipepsuch · on Feb 20, 2020

Thanks, I'm glad you liked it! Mode collapse was actually the one thing I never encountered during my exploration (which was the reason we looked into using GTNs as a mode-collapse solution for GANs). That said, I found meta-learning to be surprisingly hard to implement efficiently and ran into more bugs in both PyTorch and TensorFlow than I can count.

Changing the learner architecture is not that important actually so that's probably not your problem.

gcucurull · on Feb 20, 2020

Ok, I'll keep digging to figure out where the problem might be, thanks!

_0ffh · on Feb 20, 2020

> Feel free to ask any questions

Hi, thanks for the invitation!

How can you be sure that the synthetic data you generate does not bias the architecture search away from the optimal solution for real data in a way similar to how early truncation of learning biases architecture search towards quick learners, and possibly away from peak performers?

heyitsguay · on Feb 21, 2020

Probably just an empirical comparison with other NAS strategies and hand crafted architectures right? The whole area of research is still ruthlessly empirical.

Eug894 · on Feb 20, 2020

Oh! I have a somewhat inconvenient question. I am ok, if you don't answer it. But... Why not work for Elon Musk or the USA government? There are rumors that Uber are owned by Russians and report directly to Putin, jk ; )

SubiculumCode · on Feb 20, 2020

Not my area of expertise. Is the innovation here searching over generated training examples that appear optimize training efficiency/rate of learning for the target task?

SubiculumCode · on Feb 20, 2020

Does this technique potentially allow training on smaller datasets? I am thinking in application with neuroimaging datasets, which are usually numbered in the hundreds.