Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Transformer models are excellent at translation, but next-token prediction is not the correct architecture for it. You want something more like seq2seq. Next token prediction cares more about local consistency (i.e., going off on a tangent with a self-consistent but totally fabricated "translation") than faithfulness.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: