Hacker News new | past | comments | ask | show | jobs | submit login

Wouldn't that make the code twice as large?



It's pretty common for runtime libraries to optimize low-level routines like memcpy, math, etc. with multiple different paths chosen on the basis of CPU capability bits. It's not the whole code that's twice the size; it's small functions which are implemented 2 or 3 or 5 times depending on what features are available.


Perhaps, but size doesn't really affect runtime performance that much, especially if most codepaths are never execute -- no processor cache churn because the unused paths are never executed.

I don't really know anything about this compiler, so I'm certainly speculating. My assumption is that one writes some function foo() and the compiler prepends a dispatcher in front which forks (code paths, not processes) to one of N optimized but functionally equivalent codepaths based on the actual processor upon which the code runs.


Size does affect performance because of the cache.

If the forks are inline, and the cache works in blocks, then you are wasting cache space for code that never runs.

But considering it's intel I'm sure they thought of that.


I suspect it patches a jumptable at initialisation time based on CPU type, and all the code used by one type of CPU is bunched close together. The unused code probably isn't even paged into physical RAM.


Certainly. These are performance optimisations, they must measure results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: