It uses reinforcement learning, they had humans rank the responses of GPT-3 and ...

drexlspivey on Dec 5, 2022 | parent | context | favorite | on: Using ChatGPT as a Co-Founder

It uses reinforcement learning, they had humans rank the responses of GPT-3 and used that as a training set