I worked on the LRB through LRB3. I wrote the non-polygon pipeline for omatic (lines, points, billboards); later I switched to the shader compiler (masher) under mattomatic. Larrabee 1 (all of omatic) had fixed function on the turns: texture units, and pointer chasing. The later parts did not have GPU FF; they didn’t even have some of the rasterization opcodes (faddsets & fmad233).
I seem to remember that LRB could test & jmp in 4 cycles, if you were careful. Since it was 4x barrel processed, those jmps were “free”.
I moved to integrated GPUs, later. x86 is bloated, but LRB was not. Also, the decoder — even a big one like x86 — isn’t a major HW problem. I’d say x86’s memory hierarchy is more of an issue.
I seem to remember that LRB could test & jmp in 4 cycles, if you were careful. Since it was 4x barrel processed, those jmps were “free”.
I moved to integrated GPUs, later. x86 is bloated, but LRB was not. Also, the decoder — even a big one like x86 — isn’t a major HW problem. I’d say x86’s memory hierarchy is more of an issue.