Hacker News new | past | comments | ask | show | jobs | submit login

So yeah: we can focus on vectors at different levels of the net and these are in some sense different semantic spaces. In the article I talk about a level immediately before it projects onto the emoji vectors. If you look at the output after the projection (and do a softmax) you get a probability distribution across all emoji. This would be a different space in which each axis is an emoji, rather than the emoji being points distributed around the space.



Awesome, thanks for clarifying. So does the training optimize some property of the "semantic" layer immediately before the final emoji prediction layer? Or does it just optimize accuracy of emoji prediction directly?

And then the t-SNE projection shown in the article is based on this same layer (one before prediction)?


Well those are sort of equivalent. But yeah, we use cross-entropy between the projected output and the target emoji distribution as our objective to minimize.

And yes, we do the t-SNE on that pre-projection space. That's why we can visualize the targets (emoji) in it. We can also t-SNE the word embeddings themselves — the input to the RNN — which is also kind of interesting. It automatically learns all kinds of structures there. Chris Olah has a good post on word embeddings if you're interested: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: