For some reason people think of models as software and open source should have similar meaning. There are fundamental differences: 1) models aren't reproducible given everything, data, hardware, methodology. 2) they aren't even verifiable. i.e. given model and dataset it's impossible to say if model was trained on that data. 3) except for toys models are trained on copyrighted data. Some of it is private, like users' chats. 4) besides data there is a lot of human input after pretraining.
This means given everything you have two options: 1) train similar model yourself 2) trust model provider. In software you can get script and run, or get code and compile it in exactly the same binaries.
Naturally 'open source' has different meaning. Some are trying to monopolize it, like they know the 'truth'. Others simply ignore it. Eventually we'll settle on something.
A decent training pipeline will be able to reproduce models with equivalent aggregate performance (as measured by the evaluation metrics). And a high degree of similarity in behavior on specific inputs - but not identical.
It will not be the same exact weights. But that is not a critical bar to reach. And may software builds also fails that bar.