Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This only works until it doesn't. Start with a model that simply hasn't been trained on anything your shareholders find objectionable, and there will be nothing to reveal with abliteration.


Maybe there exists a dataset consisting entirely of objectionable content, so people can finetune neutered models on it?


PH maybe?


More like literotica.


I mean not only sex, but also swearing, drugs, violence, etc. Basically everything R-rated (but not illegal) which usually gets censored.


PH is not porn-only. A significant portion of non-porn content also exists there.


Such models would actually run against their long term interests of being able to automate away the work currently done by humans.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: