*like no mismatched push/pop, etc.* My guess is virtual stack pointer update pre...

		userbinator on April 25, 2020 \| parent \| context \| favorite \| on: I translated a simple C program to x86_64 and it w... like no mismatched push/pop, etc. My guess is virtual stack pointer update prediction latency. To expand on that, Intel's CPUs have had for a long time a separate piece of hardware dedicated to a "virtual" stack which speeds up push/pop instructions. If pushes and pops are not mismatched, then all stack operations can stay entirely within that and there's no need to update the "real" stack pointer nor stack entries upon leaving the loop.

Thank you for your answer! Any idea why would loops not use LSD when programmed using loopXX instructions but would use when cmp/jnX is used?