Maybe nothing wrong with that, but it might mean that the perceived weaknesses don't generalize to an area of the model that hasn't been lobotomized.
* using safety the way OpenAI have been using the term, not looking to debate the utility of that.
Maybe nothing wrong with that, but it might mean that the perceived weaknesses don't generalize to an area of the model that hasn't been lobotomized.
* using safety the way OpenAI have been using the term, not looking to debate the utility of that.