Maybe they didn't use a huge amount of human feedback; where it excels is coding...

		logicchains on Sept 16, 2024 \| parent \| context \| favorite \| on: g1: Using Llama-3.1 70B on Groq to create o1-like ... Maybe they didn't use a huge amount of human feedback; where it excels is coding and maths/logic, so they could have used compiler/unit tests for giving it the coding feedback and a theorem prover like Lean for the math feedback.