Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’d love to hear your thoughts on BERTs - I’ve dabbled a fair bit, fairly amateurishly, and have been astonished by their performance.

I’ve also found them surprisingly difficult and non-intuitive to train, eg deliberately including bad data and potentially a few false positives has resulted in notable success rate improvements.

Do you consider BERTs to be the upper end of traditional - or, dunno, transformer architecture in general to be a duff? Am sure you have fascinating insight on this!



That is a really good question, I am not sure where to draw the line.

I think it would be safe to say BERT is/was firmly in the non-traditional side of NLP.

A variety of task specific RNN models preceded BERT, and RNN as a concept has been around for quite a long time, with the LSTM being more modern.

Maybe word2vec ushered in the end of traditional NLP and was simultaneously also the beginning of non-traditional NLP? Much like Newton has been said to be both the first scientist and also the last magician.

I find discussing these kind of questions with NLP academics to be awkward.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: