Would be interesting to explore whether evaluating ∇²f(x) for higher-order SGD m...

jkam · on Nov 16, 2018

What kind of work are you referring to when you say higher-order SGD may _now_ be feasible for deep learning? I only find results that try to approximate second order information.

bmc7505 · on Nov 25, 2018

Not sure what you mean. The paper above claims 1000x speedups for computing second-order derivatives. Have not tested their claims, but was speculating that such an improvement, if true, would make computing hessians for small networks fesiable. This is what I am referring to.