Be nice to see some examples of decoding in your repo (forgive me if I dont see it). I remeber when i first implemented a transformer from scratch generating sequences using greedy or beam search after train/testing turned out to be harder then I thought but turns out I made a mistake with teacher forcing in the beginning so BOS tokens were meaningless to the decoder lol
What motivated me to write it is due to, as I am trying to learn about the transformer myself, not being able to find a very simple reference to transformers. The annotated transformer (https://nlp.seas.harvard.edu/2018/04/03/attention.html) for instance, is quite convoluted in my opinion, and uses difficult to understand syntax.
Thanks for the link, I haven't seen this before but It looks quite simple and nice.
I think the one in Pytorch itself isn't too bad as well, but there's huge chunk of block comments and the fact that it is entangled with other modules makes it intimidating to break down and test out / use immediately.
Yes I don't currently have no decoding besides argmax on the decoder logits(so no beam search etc).
I suppose if you just mean sequence generation there is a small function that can be found in the dataset class. But it might be good to put that somewhere visible.
I don't want to include beam search in order to not introduce anything beyond the core architecture (innovative contribution) of the original paper, and most other implementations have it. But it is a good suggestion nonetheless.
Perhaps I'd add a small script with different type of generating functions, thanks for the suggestion.
Me too :) For educational/reference purposes, there's a lot of complexity which can be introduced which is irrelevant to the principles, I hope my code was understandable enough.