Yes? Are you saying that attention is less expressive?

dartos · 2025-02-09T14:53:33 1739112813

I’m saying that LLMs (models trained on language specifically) are not automatically capable of the same generic function solving.

The network itself can be trained to solve most functions (or all, I forget precisely if NNs can solve all functions)

But the language model is not necessarily capable of solving all functions, because it was already trained on language.