There are some provisos to the “Branchless” tag that are covered in the article I drew inspiration from. One of which is that a “CMOVE” is not technically a branch :)
I was considering doing it all in a ternary statement, but I feel that the current form is also branchless because it is simply a multiply and add. The extra bounds-checking condition can probably be omitted, but I haven't tested that.
for (step >>= 1; step != 0; step >>=1) {
if ((next = begin + step) < size) {
begin += PyObject_RichCompareBool(PyList_GetItem(list_obj, next), value, Py_LT) * step;
}
}
Ah, yeah I see what you mean. If I'm understanding you correctly, the fact that we are calling the Python interpreter internal functions during that calculation makes it branch because it is not pre-calculated?
Except that arr[...] - value may overflow, and is UB in C.
Even removing the UB with something like __builtin_sub_overflow(), if arr[...] is INT_MAX and value is -2, then the difference will have the high bit set!
I haven't tried it, but this should compile to a cmove or seta, not a jump:
int cond = arr[step+begin] < value;
begin += step & -cond;