Wouldn't that make the code twice as large?

barrkel · on Jan 3, 2010

It's pretty common for runtime libraries to optimize low-level routines like memcpy, math, etc. with multiple different paths chosen on the basis of CPU capability bits. It's not the whole code that's twice the size; it's small functions which are implemented 2 or 3 or 5 times depending on what features are available.

ShabbyDoo · on Jan 3, 2010

Perhaps, but size doesn't really affect runtime performance that much, especially if most codepaths are never execute -- no processor cache churn because the unused paths are never executed.

I don't really know anything about this compiler, so I'm certainly speculating. My assumption is that one writes some function foo() and the compiler prepends a dispatcher in front which forks (code paths, not processes) to one of N optimized but functionally equivalent codepaths based on the actual processor upon which the code runs.

ars · on Jan 3, 2010

Size does affect performance because of the cache.

If the forks are inline, and the cache works in blocks, then you are wasting cache space for code that never runs.

But considering it's intel I'm sure they thought of that.

pmjordan · on Jan 3, 2010

I suspect it patches a jumptable at initialisation time based on CPU type, and all the code used by one type of CPU is bunched close together. The unused code probably isn't even paged into physical RAM.

DougBTX · on Jan 3, 2010

Certainly. These are performance optimisations, they must measure results.