I was considering doing it all in a ternary statement, but I feel that the current form is also branchless because it is simply a multiply and add. The extra bounds-checking condition can probably be omitted, but I haven't tested that.
for (step >>= 1; step != 0; step >>=1) {
if ((next = begin + step) < size) {
begin += PyObject_RichCompareBool(PyList_GetItem(list_obj, next), value, Py_LT) * step;
}
}
Ah, yeah I see what you mean. If I'm understanding you correctly, the fact that we are calling the Python interpreter internal functions during that calculation makes it branch because it is not pre-calculated?