Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I guess while Python interprets this branchless code it still can do some branching? Or am I missing something here?


There are some provisos to the “Branchless” tag that are covered in the article I drew inspiration from. One of which is that a “CMOVE” is not technically a branch :)


You have to do some really weird shit if you want generalised branchless ternary statements.

    begin += (arr[step+begin] < value)?step:0;
Something like:

    int mask = ((arr[step + begin] - value) >> 31);    //Depends on signed shift
    begin += (step & mask) | (0 & ~mask);
(Obviously in this case, it's simplifiable.)


I was considering doing it all in a ternary statement, but I feel that the current form is also branchless because it is simply a multiply and add. The extra bounds-checking condition can probably be omitted, but I haven't tested that.

  for (step >>= 1; step != 0; step >>=1) {
          if ((next = begin + step) < size) {
              begin += PyObject_RichCompareBool(PyList_GetItem(list_obj, next),     value, Py_LT) * step;
          }
      }


My point was, anywhere there's a hidden 'if' can be branching.

If there's no calculation being done, it'll simplify.

    value = (test) ? const0 : const1;
But if calculations are being done, it won't.

    value = (test) ? calc0() : calc1();
If you want non-branching where the ternary options are calculated, you need to calculate both.

This matters most with SIMD operations.

Look at section 2.5.1 (Branch-Equivalent SIMD Processing)

http://ftp.cvut.cz/kernel/people/geoff/cell/ps3-linux-docs/C...


Ah, yeah I see what you mean. If I'm understanding you correctly, the fact that we are calling the Python interpreter internal functions during that calculation makes it branch because it is not pre-calculated?


Pretty much (at least as far as I understand it).

There's probably something at the instruction level which allows the constant ternary expressions to be non-branching.


Except that arr[...] - value may overflow, and is UB in C.

Even removing the UB with something like __builtin_sub_overflow(), if arr[...] is INT_MAX and value is -2, then the difference will have the high bit set!

I haven't tried it, but this should compile to a cmove or seta, not a jump:

  int cond = arr[step+begin] < value;
  begin += step & -cond;


My bad, but still worth mentioning the generalised case:

    int mask = -(arr[step+begin] < value);
    int value = (val0 & mask)|(val1 & ~mask);




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: