Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

To my knowledge as a math-turned-ML guy, there are currently no useful geometric characterizations of deep net latent spaces that are both "deep" (in the sense of using advanced mathematics) and "useful" (in the sense of revealing properties of networks or their latent spaces that aren't understood otherwise). Of course if anyone knows better I'd love to hear about it.

Continuous geometric concepts don't play super well with the way we like to decompose model outputs into discrete entities (classes, words, visual properties). We can, e.g. find variables in celebrity face GAN latent spaces that seem related to face orientation, or hair color, sort of, over some variable range and under some input conditions, but that doesn't really translate cleanly into any typical mathematical characterizations, geometric or otherwise, where you'd be looking for some property to hold everywhere or at least have an atlas of connected local approximations to simple characterizations.

Instead, we get high-dimensional messes of spaces, and network gradients during training don't exhibit clean or easy to understand dynamics except in the simplest toy cases.

To paraphrase a more serious "math for ML" prof I've chatted with at times -- "doing math" classically involves being able to find a description with only a few free parameters for a complex phenomenon that may superficially appear to have many/infinite free parameters. It's possible that for large ML models trained on natural data, such a reduction just doesn't exist, you can't break the contributions of millions or billions of parameters down into a low-dimensional approximation. He was/is skeptical of us attaining deep mathematical insight into their operation, but he could always be wrong. I'd certainly love to see cool novel insights come out of mathematics that give clarity to what's been going on these past 15 years.



Thank you very much for the thoughtful and insightful reply!

This is obviously speculation/intuition, but it's not terribly surprising to me at least that operating in e.g. pixel-space or a straightforward lifted latent manifold (modern diffusers basically) wouldn't have apparent structure under the fancy t-SNE type things that seem to be the heaviest artillery brought to the party (at least in the open). In pixel space, you get 6-17 fingers on 1-3 hands.

The `france - paris + uk === london` thing is real, and it's not surprising because there aren't typically much in the way of nonlinearities in `word2vec`/`fasttext`/`glove` type stuff. But this substantially survives all the leaky relus or whatever in LLMs. They're pretty clearly interpolating in a way that you could get close to with a composition of affine transforms.

JEPA (and maybe Sora if..., fuck it) seems a dramatic shift in forcing joint loss into a much higher-level space/manifold with (to me at least) shockingly semantic properties. I mean look at the I-JEPA reconstructions from pre-trained lifted space with some dinky diffuser/VAE-thing eating the hyperplane:

https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/

That's not pixel space, and you've got a lot of freedom to make it smoother, I suspect no one says "L1 regularization" anymore, but there's some modern version of that, we know how to do this.

AFAIU (and again, I welcome expert correction) TDA at least and really a lot of modern geometry is about "scruffy intrinsic / smooth embedded" or vice versa, and "scruffy at this scale but smooth if you set the focus right".


preserving topology during dimension reduction might affect this? something something, fractal dimensionality, erm tropical geometry and amoebas and the, here it is, https://proceedings.mlr.press/v80/zhang18i.html

Edit, obviously I don't know my zonotope from my tropical hypersurface, I do however like the pretty pictures ;)

Here's another good piece of how topology might help work out why some models do better than others; https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

"By computing geometric descriptors of DNNs and performing large-scale model comparisons, we discovered a geometric phenomenon that has been overlooked in previous work: DNN models of high-level visual cortex benefit from high-dimensional latent representations. This finding runs counter to the view that both DNNs and neural systems benefit by compressing representations down to low-dimensional subspaces [20–39, 78]. "



Maybe! I'm lost in the maths, "Our results suggest that learned optimizers can benefit from considering the (symmetry) structure of the weight space they optimize. " this from 7th Feb came out of DeepMind; https://arxiv.org/abs/2402.05232

When it comes to the math underneath an LLM, https://medium.com/autonomous-agents/part-8-mathematical-exp... is about the most accessible explanation I have found so far.


I haven't dug into this much at all, but I am aware that some teams have been looking for methods to either discover or train in symmetries and invariances into latent spaces. For instance, in an image recognition model one might expect shift and rotation invariance properties.

I don't know if this line of research is going anywhere, but I thought it was interesting in terms of actual geometry being applied to ANNs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: