remember the word2vec paper? the surprising bit the authors were trying to show ...

remember the word2vec paper? the surprising bit the authors were trying to show was that putting words in some embedding space with an appropriate loss naturally lends enough structure to those words to be able to draw robust, human-interpretable analogies.

I agree with the sentiment that each individual dimension isn't meaningful, and I also feel like it's misleading for the article to frame it that way. But there's a grain of truth: the last step to predicting the output token is to take the dot product between some embedding and all the possible tokens' embeddings (we can interpret the last layer as just a table of token embeddings). Taking dot products in this space are equivalent to comparing the "distance" between the model's proposal and each possible output token. In that space, words like "apple" and "banana" are closer together than they are to "rotisserie chicken," so there is some coarse structure there.

Doing this, we gave the space meaning by the fact that cosine similarity is meaningful proxy for semantic similarity. Individual dimensions aren't meaningful, but distance in this space is.

A stronger article would attempt to replicate the word2vec analogy experiments (imo one of the more fascinating parts of that paper) with GPT's embeddings. I'd love to see if that property holds.