To my knowledge as a math-turned-ML guy, there are currently no useful geometric...

benreesman · on Feb 17, 2024

Thank you very much for the thoughtful and insightful reply!

This is obviously speculation/intuition, but it's not terribly surprising to me at least that operating in e.g. pixel-space or a straightforward lifted latent manifold (modern diffusers basically) wouldn't have apparent structure under the fancy t-SNE type things that seem to be the heaviest artillery brought to the party (at least in the open). In pixel space, you get 6-17 fingers on 1-3 hands.

The `france - paris + uk === london` thing is real, and it's not surprising because there aren't typically much in the way of nonlinearities in `word2vec`/`fasttext`/`glove` type stuff. But this substantially survives all the leaky relus or whatever in LLMs. They're pretty clearly interpolating in a way that you could get close to with a composition of affine transforms.

JEPA (and maybe Sora if..., fuck it) seems a dramatic shift in forcing joint loss into a much higher-level space/manifold with (to me at least) shockingly semantic properties. I mean look at the I-JEPA reconstructions from pre-trained lifted space with some dinky diffuser/VAE-thing eating the hyperplane:

https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/

That's not pixel space, and you've got a lot of freedom to make it smoother, I suspect no one says "L1 regularization" anymore, but there's some modern version of that, we know how to do this.

AFAIU (and again, I welcome expert correction) TDA at least and really a lot of modern geometry is about "scruffy intrinsic / smooth embedded" or vice versa, and "scruffy at this scale but smooth if you set the focus right".

tudorw · on Feb 17, 2024

preserving topology during dimension reduction might affect this? something something, fractal dimensionality, erm tropical geometry and amoebas and the, here it is, https://proceedings.mlr.press/v80/zhang18i.html

Edit, obviously I don't know my zonotope from my tropical hypersurface, I do however like the pretty pictures ;)

Here's another good piece of how topology might help work out why some models do better than others; https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

"By computing geometric descriptors of DNNs and performing large-scale model comparisons, we discovered a geometric phenomenon that has been overlooked in previous work: DNN models of high-level visual cortex benefit from high-dimensional latent representations. This finding runs counter to the view that both DNNs and neural systems benefit by compressing representations down to low-dimensional subspaces [20–39, 78]. "

benreesman · on Feb 18, 2024

https://en.wikipedia.org/wiki/Parallel_transport ?

tudorw · on Feb 18, 2024

Maybe! I'm lost in the maths, "Our results suggest that learned optimizers can benefit from considering the (symmetry) structure of the weight space they optimize. " this from 7th Feb came out of DeepMind; https://arxiv.org/abs/2402.05232

When it comes to the math underneath an LLM, https://medium.com/autonomous-agents/part-8-mathematical-exp... is about the most accessible explanation I have found so far.

tel · on Feb 18, 2024

I haven't dug into this much at all, but I am aware that some teams have been looking for methods to either discover or train in symmetries and invariances into latent spaces. For instance, in an image recognition model one might expect shift and rotation invariance properties.

I don't know if this line of research is going anywhere, but I thought it was interesting in terms of actual geometry being applied to ANNs.