Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think your mental model could be making LLMs seem more confusing than they are. LLMs are stacks of transformers and generative LLMs typically have another model that samples the transformer output.

Maybe there's useful abstractions for analyzing them, but LLMs are just another deep learning model.



The "attention" mechanism (a bit of a misnomer really) is what makes transformers more complex than many other neural nets - data isn't simply flowing through the model from layer to layer, but rather it is being copied and moved around by the attention heads. The "next word" it is generating doesn't even have to be a word it has ever seen before - it may be copying it from the prompt.


Interesting. Any suggestions/references for learning about attention from this perspective?


The paper I read was this one from Catherine Olsson et al at Anthropic.

https://transformer-circuits.pub/2022/in-context-learning-an...

There's a useful article here that expands on the types of head composition and provides some illustrations.

https://www.lesswrong.com/posts/TvrfY4c9eaGLeyDkE/induction-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: