The author of the answer has set multiple world records for calculating digits of pi. I highly recommend browsing through his answer history. You can learn more from his answers than from entire computer science classes.
Well, it is a very good answer. Most likely people admire that.
Also, I've been here for three years and I don't recall seeing it before, so it can't be posted that often. You must just happen to catch it a lot. I wouldn't have known it was a dupe if you hadn't pointed it out.
But we don't just see money as a proxy for power. It is the closest proxy for power there is. HN karma, on the other hand, is barely a proxy for power at all.
Question: can a processor have two pipelines? When there's a branch it would be able to start loading both branches and just switch the pipeline when the final branch is determined.
It might still need to flush the pipelines when there are several branches in succession but you'd still be able to use branch prediction to predict the two most likely branches, which would help tremendously.
This is basically what computers do, it's called speculative execution. Modern computers actually have several pipelines, each for a different function. This is what makes a computer "superscalar". Basically, the CPU tries to keep the pipelines of all of the execution units filled at all times. If it's able to, it will schedule both sides of a branch to be executed, and flush out whichever half wasn't used.
speculative execution and superscalar are independent concepts. While superscalar does effectively mean multiple pipelines, you don't have to be superscalar for speculative execution. You can reduce pipeline stalls in a single pipeline by "speculating" right as well. The concept that GP is talking was a feature of of Itanium.
"
The EPIC architecture also includes a grab-bag of architectural concepts to increase ILP:
Predicated execution is used to decrease the occurrence of branches and to increase the speculative execution of instructions. In this feature, branch conditions are converted to predicate registers which are used to kill results of executed instructions from the side of the branch which is not taken.
.
.
.
Multi-way branch instructions improve branch prediction by combining many alternative branches into one bundle.
"
Generally speculation only executes one side of the branch. Speculative threading has been proposed using SMT to execute both sides, but usually you're better off trusting the branch predictor and using threads as non-speculative state.
Speculation down both sides is independent of SMT and if I remember correct there were proposals to do this - especially where the branch predictor "knew" it wasn't doing a good job on a branch (no point going down both sides of a 99.9% predicted branch).
The problem is that you rapidly run out of execution resources (what if there are more branches to follow, for example) and you will burn lots of energy executing things that don't happen (not to mention speculatively loading things that aren't needed, etc). I remember seeing some papers on the limits of ILP and I think this was one of the 'extreme' ideas (speculation down multiple paths).
Sure, but it defeats the purpose of branch prediction. If you have a high confidence of which branch will be taken you don't want to waste execution units. The high level of data dependencies between branch points is another obstacle. It isn't as simple as hyperthreading.
I think it's possible, but I think it's more efficient to use a completely separate...thread?...if you have a 2nd core. (Or if you have lots of empty space bubbles in your pipeline, you can just try forcing 2 threads in there, aka Hyperthreading)
That makes sense for most processor architectures. But there are exceptions. The Cell Processor (Sony PS3 Processor) has 8 RISC SPUs. These processors do not have branch prediction logic. This is because they were designed for computationally intensive graphics kernels which are useful for Sony PS3 games. There is a huge performance intensive to avoiding branches on these processors. So many times both routes are taken and then the appropriate value is chosen using a mask instruction. This does not stall the pipeline on the processor.
http://stackoverflow.com/users/922184/mysticial