The part that's really weird is that on modern CPUs predicted branches are free ...

dzaima · 2025-04-24T20:36:49 1745527009

The limiting thing isn't necessarily speculating, but more just the number of branches per cycle, i.e. number of non-contiguous locations the processor has to query from L1 / uop cache (and which the branch predictor has to determine the location of). You get that limit with unconditional branches too.

gpderetta · 2025-04-25T12:25:54 1745583954

Indeed, the limit is on taken branches, hence why making the most likely case fall through is often an optimization.

adgjlsfhk1 · 2025-04-28T03:09:57 1745809797

The tricky part here is that compilers are pretty bad (without PGO at least) of knowing what side of the branch matters.