Contrastive Representation Learning

roknovosel · on Aug 19, 2022

Great read, thanks for sharing. Would love to see the natural language + code mixed in there :)

I've been interested in contrastive learning for a while, mainly as a means to train semantic code search models. OpenAI released a great paper on this topic called Text and Code Embeddings by Contrastive Pre-Training[1] that outlines the approach. I've used it as a base to build https://codesearch.ai [2] with pretty good results.

[1] https://arxiv.org/pdf/2201.10005.pdf [2] https://sourcegraph.com/notebooks/Tm90ZWJvb2s6MTU1OQ==

cs702 · on Aug 19, 2022

Nice job! This is a fantastic resource for anyone interested in using contrastive methods for inducing AI/ML models to learn to embed data in a space such that samples considered similar stay close to each other (e.g., as measured by cosine or Euclidean distance) while dissimilar ones stay far apart. Self-supervised contrastive methods, in particular, can be remarkably useful when none of the samples in your data are labeled and you want your model to discover structure.

chrisbrndl · on Aug 23, 2022

A couple of months ago I wrote a blog post that explains supervised contrastive loss in more detail. It is quite beta and I have never really shared it with anyone outside the company so far, so happy for any feedback.

https://ai.worldcoin.dev/supcon/

fxtentacle · on Aug 19, 2022

I'm surprised that this doesn't mention cross-entropy, the contrastive loss function used by Facebook's wav2vec2 XLS-R pretraining paper and by OpenAI's CLIP.

canjobear · on Aug 19, 2022

Contrastive losses arise from using methods like NCE (mentioned in the post) to approximate cross entropy loss when the partition function is intractable.

zmmmmm · on Aug 20, 2022

I assume there are people out there in the world who actually find equations like this comprehensible?

https://imgur.com/O2BEyjp

One of my main barriers to reading and learning from academic literature in this space is that while I understand all the words perfectly well, invariably I hit something like this which might as well be in chinese characters to me.

bglazer · on Aug 20, 2022

Unfortunately, these equations only started making intuitive sense for me once I saw them repeated many times, in different contexts. The key was being able to recognize patterns and being able to roughly predict the contents of the equation based on its structure. For example, here the equation begins with the L(x_i, x_j, phi) which looks intimidating but it’s a very standard way of saying “likelihood” with mathematical notation. Then, you might notice that the equation has two “indicator” functions (the strange looking 1’s). Again, weird but it can be translated to an if else statement. Then, it’s easy to predict that one will have an = sign and the other a =/=.

All that to say that it’s painful for a bit to parse every symbol, but it gets easier to recognize the semantics after you’ve done it for similar cases.

antishatter · on Aug 20, 2022

You get used to it

hadrianpaulo · on Aug 19, 2022

Are there techniques for contrastive learning that's also applicable to tabular data?

mountainriver · on Aug 19, 2022

another incredible article by Lilian Weng, never ceases to impress and enlighten