Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
eddyzh
on Dec 19, 2024
|
parent
|
context
|
favorite
| on:
Alignment faking in large language models
Very fascinating read. Especially the reviewers comments linked at the end. The point is that alignment after training is much more complicated and limited than it might appear. And they make that point convincingly.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: