They can learn from the outcomes of their actions, even if they can only act in a Python REPL or a game, because that would be easy to scale. But interfacing LLMs with external systems and people is an even better source of feedback. In other words create their own experiences and learn from them.