This is an uninformed take. Much of the improvement in performance of LLM based ...

mbesto · 2025-10-16T20:26:49 1760646409

> This is an uninformed take.

You may disagree with this take but its not uninformed. Many LLMs use self‑supervised pretraining followed by RL‑based fine‑tuning but that's essentially it - it's fine tuning.

vonneumannstan · 2025-10-17T14:02:09 1760709729

I think you're seriously underestimating the importance of the RL steps on LLM performance.

Also how do you think the most successful RL models have worked? AlphaGo/AlphaZero both use Neural Networks for their policy and value networks which are the central mechanism of those models.