Great read, thanks for sharing. Would love to see the natural language + code mixed in there :)
I've been interested in contrastive learning for a while, mainly as a means to train semantic code search models. OpenAI released a great paper on this topic called Text and Code Embeddings by Contrastive Pre-Training[1] that outlines the approach. I've used it as a base to build https://codesearch.ai [2] with pretty good results.
Nice job! This is a fantastic resource for anyone interested in using contrastive methods for inducing AI/ML models to learn to embed data in a space such that samples considered similar stay close to each other (e.g., as measured by cosine or Euclidean distance) while dissimilar ones stay far apart. Self-supervised contrastive methods, in particular, can be remarkably useful when none of the samples in your data are labeled and you want your model to discover structure.
A couple of months ago I wrote a blog post that explains supervised contrastive loss in more detail. It is quite beta and I have never really shared it with anyone outside the company so far, so happy for any feedback.
I'm surprised that this doesn't mention cross-entropy, the contrastive loss function used by Facebook's wav2vec2 XLS-R pretraining paper and by OpenAI's CLIP.
Contrastive losses arise from using methods like NCE (mentioned in the post) to approximate cross entropy loss when the partition function is intractable.
One of my main barriers to reading and learning from academic literature in this space is that while I understand all the words perfectly well, invariably I hit something like this which might as well be in chinese characters to me.
Unfortunately, these equations only started making intuitive sense for me once I saw them repeated many times, in different contexts. The key was being able to recognize patterns and being able to roughly predict the contents of the equation based on its structure. For example, here the equation begins with the L(x_i, x_j, phi) which looks intimidating but it’s a very standard way of saying “likelihood” with mathematical notation. Then, you might notice that the equation has two “indicator” functions (the strange looking 1’s). Again, weird but it can be translated to an if else statement. Then, it’s easy to predict that one will have an = sign and the other a =/=.
All that to say that it’s painful for a bit to parse every symbol, but it gets easier to recognize the semantics after you’ve done it for similar cases.
I've been interested in contrastive learning for a while, mainly as a means to train semantic code search models. OpenAI released a great paper on this topic called Text and Code Embeddings by Contrastive Pre-Training[1] that outlines the approach. I've used it as a base to build https://codesearch.ai [2] with pretty good results.
[1] https://arxiv.org/pdf/2201.10005.pdf [2] https://sourcegraph.com/notebooks/Tm90ZWJvb2s6MTU1OQ==