To add, there is the important distinction to be made between RLHF (Reinforcemen...

To add, there is the important distinction to be made between RLHF (Reinforcement Learning with Human Feedback) and RL. DPO is a simpler and more efficient way to do RLHF. In its current iteration, Augento does RL (using the term coined by OpenAI: Reinforcement Fine-tuning) which improves model performance on domains where there exists a verification function for the answer that you can use for scoring, rather than a preferred answer such as DPO needs. But as said, such preference mode is on the roadmap.