I’d love to hear your thoughts on BERTs - I’ve dabbled a fair bit, fairly amateurishly, and have been astonished by their performance.
I’ve also found them surprisingly difficult and non-intuitive to train, eg deliberately including bad data and potentially a few false positives has resulted in notable success rate improvements.
Do you consider BERTs to be the upper end of traditional - or, dunno, transformer architecture in general to be a duff? Am sure you have fascinating insight on this!
That is a really good question, I am not sure where to draw the line.
I think it would be safe to say BERT is/was firmly in the non-traditional side of NLP.
A variety of task specific RNN models preceded BERT, and RNN as a concept has been around for quite a long time, with the LSTM being more modern.
Maybe word2vec ushered in the end of traditional NLP and was simultaneously also the beginning of non-traditional NLP? Much like Newton has been said to be both the first scientist and also the last magician.
I find discussing these kind of questions with NLP academics to be awkward.
I’ve also found them surprisingly difficult and non-intuitive to train, eg deliberately including bad data and potentially a few false positives has resulted in notable success rate improvements.
Do you consider BERTs to be the upper end of traditional - or, dunno, transformer architecture in general to be a duff? Am sure you have fascinating insight on this!