I haven’t read the manuscript yet, and am not sure that I will. However I don’t agree with the question. Gradient descent, the properties of the loss function are the “how”. It seems like you want to know how some properties of the data are manifested in the network itself during/after training (what these properties are doesn’t seem to be something that people know they are looking for). Maybe that’s what the authors are interested in as well. If I could bet money in Vegas on the answer to that question, my bet would be in most cases that structures we may probe in the network and see in them correlations to aspects of the problem or task that we (as humans) can recognize, well very likely this will boil down to approximations of fundamental and eminently useful quantities like, say, approximate singular value decompositions of regions in the data manifold, or approximate eigenfunctions etc. I could see how these kind of empirical investigations are interesting, but what would their impact be? Another guess, that these investigations may lead to insights that help engineers design better architectures or incrementally improve training methods. But I think that’s about it - this type of research strikes me as engineering and application.
Outside of pure interest - how these LLMs are working, the utility/impact of understanding them would be to be able to control them - how to remove capabilities you don't want them to have (safety), or perhaps even add capabilities, or just steer their behavior in some desirable way.
Pretty much everything about NNs is engineering - it's basically an empirical technology, not one that we have much theoretical understanding of outside of the very basics.
> Pretty much everything about NNs is engineering - it's basically an empirical technology, not one that we have much theoretical understanding of outside of the very basics.
This pretty much answers the question some have asked: “why are the world’s preeminent mathematicians not working on AI if AGI will solve everything eventually anyway?”.
At least for now, the skills required to make progress in AI (machine learning as it largely is now) are those of an engineer rather than a mathematician.