This doesn't really make any sense. All you would need to do is compile the code on an Intel machine to get fast speed and then you can run it on an AMD machine. It shouldn't really cause any problems as long as developers build on genuine Intel machines. Of course that it irritating, but it shouldn't cause any slowdown on other machines.
I think the compiler generates code which checks processor type at runtime, not compile time. If the compiled code is running on an AMD processor, the "safe" version of the compiled code is chosen automagically.
It's pretty common for runtime libraries to optimize low-level routines like memcpy, math, etc. with multiple different paths chosen on the basis of CPU capability bits. It's not the whole code that's twice the size; it's small functions which are implemented 2 or 3 or 5 times depending on what features are available.
Perhaps, but size doesn't really affect runtime performance that much, especially if most codepaths are never execute -- no processor cache churn because the unused paths are never executed.
I don't really know anything about this compiler, so I'm certainly speculating. My assumption is that one writes some function foo() and the compiler prepends a dispatcher in front which forks (code paths, not processes) to one of N optimized but functionally equivalent codepaths based on the actual processor upon which the code runs.
I suspect it patches a jumptable at initialisation time based on CPU type, and all the code used by one type of CPU is bunched close together. The unused code probably isn't even paged into physical RAM.