https://en.wikipedia.org/wiki/Parallel_transport ?

tudorw · on Feb 18, 2024

Maybe! I'm lost in the maths, "Our results suggest that learned optimizers can benefit from considering the (symmetry) structure of the weight space they optimize. " this from 7th Feb came out of DeepMind; https://arxiv.org/abs/2402.05232

When it comes to the math underneath an LLM, https://medium.com/autonomous-agents/part-8-mathematical-exp... is about the most accessible explanation I have found so far.