Just to expand a bit, the sigma symbol is a standard symbol used to indicate the standard deviation of a measurement, and standard deviation is roughly a measure of how much variation there is within a data set (and consequently how confident you can be in your measurement). So when they say that the theoretical result is now 4.2 sigma (units of standard deviation) away from the experimental result instead of 2.7 sigma, that is because the new experiment provided more precise data that scientists could use to lower the perceived variance.
Assuming that there were no experimental errors, you can use the measure of standard deviation to express roughly what % chance a measurement is due to a statistical anomaly vs. a real indication that something is wrong.
To put some numbers to this, a measurement 1 sigma from the prediction would mean that there is roughly a 84% chance that the measurement represented a deviation from the prediction and a 16% chance that it was just a statistical anomaly. Similarly:
> 2 sigma = 97.7%/2.3% chance of deviation/anomaly
> 3 sigma = 99.9%/0.1% chance of deviation/anomaly
> 4.2 sigma = 99.9987%/0.0013% chance of deviation/anomaly
Which is why this is potentially big news since there is a very small chance that the disagreements between measurement and prediction are due to a statistical anomaly, and a higher chance that there are some fundamental physics going on that we don't understand and thus cannot predict.
edit: Again, this assumes both that there were no errors made in the experiment (it inspires confidence that they were able to reproduce this result twice in different settings) and that there were no mistakes made in the predicition itself, which as another commenter mentions eleswhere, is a nontrivial task in and of itself.
> a measurement 1 sigma from the prediction would mean that there is roughly a 84% chance that the measurement represented a deviation from the prediction and a 16% chance that it was just a statistical anomaly.
No, this is a p-value misinterpretation. Sigma has to do with the probability that, if the null hypothesis were true, the observed data would be generated. It does not reflect the probability that any hypothesis is true given the data.
Hm, I was not being particularly precise with my language because I was trying to make my explanation easily digestible, but please correct me if I'm wrong.
The null hypothesis is that there are no new particles or physics and the Standard Model predicts the magnetic charge of a muon. A 4.2 sigma result means that given this null hypothesis prediction, the chances that we would have observed the given data is ~0.0013% (chance this was a statistical anomaly). Since this is a vanishingly small chance (assuming no experimental errors), we can reasonably reject the hypothesis that the Standard Model wholly predicts the charge of a muon.
> Again, this assumes both that there were no errors made in the experiment
This is worth repeating a lot when explaining sigma (even in a great and comprehensive explanation such as yours): Statistical anomalies are only relevant when the experiment itself is sound.
Imagine you are trying to see whether two brands of cake mix have different density (maybe you want to get a good initial idea whether they could be the same cake mix). You can do this by weighing the same amount (volume) of cake mix repeatedly, and comparing the mean value for weight measurements of either brand. That works well, but it totally breaks down if you consistently use a glass bowl for one brand, and a steel bowl for the other brand. You will get very high units of sigma, but not because of the cake mix.
Nitpick: it assumes that there were no systematic errors. If (say) you switch randomly between steel and glass bowls, you results will still be valid, just with much wider (worse) standard deviation than you could have gotten otherwise (or much greater numbers of measurements needed for a given accuracy, due to Shannon/noise floor issues).
Yes, that's entirely my point, hence why I said "consistently" using one type of bowl for one brand. That's a systematic error, but since this was supposed to be educational, I preferred explaining the error instead of using terminology that basically implies knowledge already.