The identity issue is not an evidence at all. It is the easiest thing to clean from data, if you are actually distilling GPT-4, that would be the first thing you do to remove those data samples.
It is predicting next token, are we really taking its words and think the model knows what it is saying?
It is predicting next token, are we really taking its words and think the model knows what it is saying?