In my opinion the more likely thing is that OpenAI is gaslighting people that the finetuning is improving the model when it likely mostly improves safety at some cost to capability. I'd bet this is measured against a set of evals and it looks like it performs well BUT I'd also bet the evals are asymmetrically good at detecting "unsafe" or jailbreak behavior and bad at detecting reduced general cognitive flexibility.
The obvious avenue to degradation is that the "HR personality" is much more strictly applied and the resistance to being jailbroken is also in some sense an inability to think.
The ability to detect quality is harder than the ability to detect defects, so the obvious metric is improved while the nebulous one is "good enough". They are competing goals.
This is not necessarily the case, and even if it is doesn't imply gaslighting as compared to inability to measure.
The obvious avenue to degradation is that the "HR personality" is much more strictly applied and the resistance to being jailbroken is also in some sense an inability to think.