No, the essence of my point is that the number of sigmas is meaningless when you have a systematic error — in either the experiment or the theoretical estimate — all that the sigmas tell you is that the two are mismatched. If a mistake could happen once, a similar mistake could easily happen again, so we need to be extremely wary of taking the sigmas at face value. (Eg: the DAMA experiment reports dark matter detections with over 40sigma significance, but the community doesn’t take their validity too seriously)
Any change in the theoretical estimates could in principle drastically change the number of sigmas mismatch with experiment in either direction (but as the scientific endeavor is human after all, typically each helps debug the other and the two converge over time).
“Similar” is doing a lot of work there - what constitutes similar basically dictates if error correction has any future proofing benefits or none at all.
Are you asking are systematic errors "priced-in"/"automatically represented" or are they hidden inside the sigma calculation?
Systematic errors can easily remain hidden. The faster-than-light neutrino had 6-sigma confidence[0], but 4 other labs couldn't reproduce the results. In the end it was attributed to fiber optic timing errors.
So if you don't know you have a system error, then you can very easily get great confidence in fundamentally flawed results.
No. As written in another comment, imagine trying to determine whether two brands of cake mixes have the same density by weighing them. If you always weigh one of the brands with a glass bowl, but the other one with a steel bowl, you'll get enormously high units of sigma, but in reality you've only proven that steel is heavier than glass.
Any change in the theoretical estimates could in principle drastically change the number of sigmas mismatch with experiment in either direction (but as the scientific endeavor is human after all, typically each helps debug the other and the two converge over time).