Hacker News new | past | comments | ask | show | jobs | submit login

There are of course statistical methods designed to support early stopping. But I don’t think you can use a regular p-test every day and decide to stop if p < 0.05. That’s something else.



You use full both sided ANOVA F test with multiple comparison correction for that. Even these tests are sometimes not conservative enough, because the correction is a bit of a guess.

You will end up with much higher number of trials required to hit the P value than the version with predetermined number of trials and no stopping point by P.

Say, in a single variable single run ABX test, 8 is the usual number needed according to Fischer frequentist approach. If you do multiple comparison to hit 0.05 you need I believe 21 trials instead. (Don't quote me on that, compute your own Bayesian beta prior probability.)

The number of trials to differentiate from a fair coin is the typical comparison prior, giving a beta distribution. You're trying to set up a ratio between the two of them, one fitted to your data, the other null.


The general topic and some specific ways to estimate a correction are described under this term: https://en.wikipedia.org/wiki/Sequential_analysis


Multiple comparisons and sequential hypothesis testing / early stopping aren't the same problem. There might be a way to wrangle an F test into a sequential hypothesis testing approach, but it's not obvious (to me anyway) how one would do so. In multiple comparisons each additional comparison introduces a new group with independent data; in sequential hypothesis testing each successive test adds a small amount of additional data to each group so all results are conditional. Could you elaborate or provide a link?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: