Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We...do know how they work?


We know how they work in that we built the framework, we don't know how they work in that we cannot decode what is "grown" on that framework during training.

If we completely knew how they worked we could go inside an explain exactly why every token generated was generated. Right now that is not possible to do, as the paths the tokens take through the layers tend to be outright nonsensical when observed.


We know how they're trained. We know the architecture in broad strokes (amounting to a few bits out of billions, albeit important bits). Some researchers try to understand the workings and have very very far to go.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: