Let’s try to be charitable, shall we? Everyone makes mistakes sometimes, even leading experts in low-level algorithm optimization. Lemire was upfront about making a mistake, and not at all defensive about it; if you are reading it that way, it’s just you.
It is clearly the case that the M1 CPU/SoC has a significant performance advantage in typical branchy single-core code, but much less advantage if any for certain kinds of heavily optimized numerics. Beyond that high-level summary, it’s good to dive into the details, and spark discussions.
Everyone is just now getting their hands on these chips, learning how to work with them, and trying to figure out how to best optimize for them.
It is clearly the case that the M1 CPU/SoC has a significant performance advantage in typical branchy single-core code, but much less advantage if any for certain kinds of heavily optimized numerics. Beyond that high-level summary, it’s good to dive into the details, and spark discussions.
Everyone is just now getting their hands on these chips, learning how to work with them, and trying to figure out how to best optimize for them.