A similar trick is used for a different reason in statistical and probabilistic ...

nwallin · on Nov 13, 2021

> I'm a bit suspicious that the branches in the streaming log-exp-sum might cause a performance impact when executing,

Since you're just checking whether the current sample is larger than the largest sample seen so far, you're very likely to find a "large" sample early that rarely gets updated. From then on, this branch will virtually always be false, and the branch predictor will make this fast. Unless the distribution is increasing (but not monotonically increasing) over time in an unpredictable way; in that case, the branch predictor will fail often and will be slow.

The branch predictor (and cache, for that matter) is a sort of Schrodinger's cat that makes programs both slow and fast at the same time, but you never know which until you benchmark it.

shoo · on Nov 13, 2021

> you're very likely to find a "large" sample early that rarely gets updated. From then on, this branch will virtually always be false

Good point, provided there's a decent number of elements being reduced.

In the application I've been focusing on, many of the batched log-exp-sum reductions are over tiny arrays containing 1 to 4 log-prob elements. There's already branching to guard against the case where all elements are log-prob -inf (aka probability zero). I found it helpful to also branch to special case the 1 element case, in which case the reduction is the identity operation, saving both an exp and a log. It's probably the case that inside the exp and log there's branching as well, so doesn't make sense to get too myopically focused on that single aspect of performance.