Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've found Claude to be way too congratulatory and apologetic. I think they've observed this too and have tried to counter it by placing instructions like that in the system prompt. I think Anthropic are doing other experiments as well about "lobotomizing" out the pathways of sycophancy. I can't remember where I saw that, but it's pretty cool. In the end, the system prompts become pretty moot, as the precise behaviours and ethics will become more embedded in the models themselves.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: