Very fascinating read. Especially the reviewers comments linked at the end. The ...

		eddyzh on Dec 19, 2024 \| parent \| context \| favorite \| on: Alignment faking in large language models Very fascinating read. Especially the reviewers comments linked at the end. The point is that alignment after training is much more complicated and limited than it might appear. And they make that point convincingly.