A lot of features of Intel CPUs can be explained by the fact that the Pentium Pro (and basically every Intel CPU after that until I believe Sandy Bridge), uses a basic architecture that supports reading only two input operands for each instruction in a single cycle. CMOV has to read the flags register, the source register, and the old value of the destination register.
See: http://newsgroups.derkeiler.com/Archive/Comp/comp.arch/2013-....