There are of course statistical methods designed to support early stopping. But ...

AstralStorm · 2025-05-13T08:28:05 1747124885

You use full both sided ANOVA F test with multiple comparison correction for that. Even these tests are sometimes not conservative enough, because the correction is a bit of a guess.

You will end up with much higher number of trials required to hit the P value than the version with predetermined number of trials and no stopping point by P.

Say, in a single variable single run ABX test, 8 is the usual number needed according to Fischer frequentist approach. If you do multiple comparison to hit 0.05 you need I believe 21 trials instead. (Don't quote me on that, compute your own Bayesian beta prior probability.)

The number of trials to differentiate from a fair coin is the typical comparison prior, giving a beta distribution. You're trying to set up a ratio between the two of them, one fitted to your data, the other null.

thelamest · 2025-05-13T10:29:06 1747132146

The general topic and some specific ways to estimate a correction are described under this term: https://en.wikipedia.org/wiki/Sequential_analysis

jpeloquin · 2025-05-13T14:47:00 1747147620

Multiple comparisons and sequential hypothesis testing / early stopping aren't the same problem. There might be a way to wrangle an F test into a sequential hypothesis testing approach, but it's not obvious (to me anyway) how one would do so. In multiple comparisons each additional comparison introduces a new group with independent data; in sequential hypothesis testing each successive test adds a small amount of additional data to each group so all results are conditional. Could you elaborate or provide a link?