A Simple Transformer Implementation(PyTorch) Without Difficult Syntax

IpsumDominum · on Jan 31, 2022

Yet another implementation of "attention is all you need paper", hopefully easier(relatively) to understand. Please point out any mistakes :)

domenicrosati · on Jan 31, 2022

Cool! What motivated you to write it? I find the d2l implementation nice and simple as well https://d2l.ai/chapter_attention-mechanisms/transformer.html

The one in pytorch itself isnt so bad either.

Be nice to see some examples of decoding in your repo (forgive me if I dont see it). I remeber when i first implemented a transformer from scratch generating sequences using greedy or beam search after train/testing turned out to be harder then I thought but turns out I made a mistake with teacher forcing in the beginning so BOS tokens were meaningless to the decoder lol

IpsumDominum · on Jan 31, 2022

What motivated me to write it is due to, as I am trying to learn about the transformer myself, not being able to find a very simple reference to transformers. The annotated transformer (https://nlp.seas.harvard.edu/2018/04/03/attention.html) for instance, is quite convoluted in my opinion, and uses difficult to understand syntax.

Thanks for the link, I haven't seen this before but It looks quite simple and nice.

I think the one in Pytorch itself isn't too bad as well, but there's huge chunk of block comments and the fact that it is entangled with other modules makes it intimidating to break down and test out / use immediately.

Yes I don't currently have no decoding besides argmax on the decoder logits(so no beam search etc).

I suppose if you just mean sequence generation there is a small function that can be found in the dataset class. But it might be good to put that somewhere visible.

I don't want to include beam search in order to not introduce anything beyond the core architecture (innovative contribution) of the original paper, and most other implementations have it. But it is a good suggestion nonetheless.

Thanks a lot for the comment! :)

mark_l_watson · on Jan 31, 2022

I agree, adding generating sequences would be good.

I like projects like this that are short and understandable. Also, some credit to Chenrong Lu for writing the low level code that is used.

Nice!

IpsumDominum · on Jan 31, 2022

Perhaps I'd add a small script with different type of generating functions, thanks for the suggestion.

Me too :) For educational/reference purposes, there's a lot of complexity which can be introduced which is irrelevant to the principles, I hope my code was understandable enough.

Thanks a lot!

p1esk · on Feb 1, 2022

Here's another simple implementation: https://github.com/karpathy/minGPT