Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, in the sense that they can be formalised as such. Given n tokens t1..tn, a transformer computes the probability p(t') that every token t' has of being the next. Each of these probabilities are arcs from a state t1..tn to another state t1..tn,t' weighted by p(t').

And since the context window is bounded, this is guaranteed to be finite-state, and all states are transient except those where the window is saturated.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: