I disagree with this wholeheartedly. Sure, there is lots of trial and error, but...

saberience · 2025-05-20T15:20:35 1747754435

None of the major aspects of deep learning came from manifolds though.

It is primarily linear algebra, calculus, probability theory and statistics, secondarily you could add something like information theory for ideas like entropy, loss functions etc.

But really, if "manifolds" had never been invented/conceptualized, we would still have deep learning now, it really made zero impact on the actual practical technology we are all using every day now.

qbit42 · 2025-05-20T15:58:37 1747756717

Loss landscapes can be viewed as manifolds. Adagrad/ADAM adjust SGD to better fit the local geometry and are widely used in practice.

kwertzzz · 2025-05-20T14:53:05 1747752785

Can you give an example where theories and techniques from other fields are reinvented? I would be genuinely interested for concrete examples. Such "reinventions" happen quite often in science, so to some degree this would be expected.

srean · 2025-05-20T15:02:48 1747753368

Bethe ansatz is one. It took a toure de force by Yedidia to recognize that loopy belief propagation is computing the stationary point of Bethe's approximation to Free Energy.

Many statistical thermodynamics ideas were reinvented in ML.

Same is true for mirror descent. It was independently discovered by Warmuth and his students as Bregman divergence proximal minimization, or as a special case would have it, exponential gradient algorithms.

One can keep going.

ogogmad · 2025-05-20T15:11:41 1747753901

The connections of deep learning to stat-mech and thermodynamics are really cool.

It's led me to wonder about the origin of the probability distributions in stat-mech. Physical randomness is mostly a fiction (outside maybe quantum mechanics) so probability theory must be a convenient fiction. But objectively speaking, where then do the probabilities in stat-mech come from? So far, I've noticed that the (generalised) Boltzmann distribution serves as the bridge between probability theory and thermodynamics: It lets us take non-probabilistic physics and invent probabilities in a useful way.

srean · 2025-05-20T15:21:09 1747754469

In Boltzmann's formulation of stat-mech it comes from the assumption that when a system is in "equilibrium", then all the micro-states that are consistent with the macro-state are equally occupied. That's the basis of the theory. A prime mover is thermal agitation.

It can be circular if one defines equilibrium to be that situation when all the micro-states are equally occupied. One way out is to define equilibrium in temporal terms - when the macro-states are not changing with time.

mitthrowaway2 · 2025-05-20T16:23:54 1747758234

The Bayesian reframing of that would be that when all you have measured is the macrostate, and you have no further information by which to assign a higher probability to any compatible microstate than any other, you follow the principle of indifference and assign a uniform distribution.

srean · 2025-05-20T17:03:37 1747760617

Yes indeed, thanks for pointing this out. There are strong relationships between max-ent and Bayesian formulations.

For example one can use a non-uniform prior over the micro-states. If that prior happens to be in the Darmois-Koopman family that implicitly means that there are some non explicitly stated constraints that bind the micro-state statistics.

nickpsecurity · 2025-05-20T15:08:47 1747753727

One might add 8-16-bit training and quantization. Also, computing semi-unreliable values with error correction. Such tricks have been used in embedded, software development on MCU's for some time.

whatever1 · 2025-05-20T15:55:36 1747756536

I mean the entire domain of systems control is being reinvented by deep RL. System identification, stability, robustness etc

srean · 2025-05-20T17:31:59 1747762319

Good one. Slightly different focus but they really are the same topic. Historically, Control Theory has focused on stability and smooth dynamics while RL has traditionally focused on convergence of learning algorithms in discrete spaces.