It's mostly about avoiding the latency penalty. There's only so much ILP you can...

vasco · on June 1, 2024

This is exactly it. Speculative execution comes from instruction set design and basic constraints of branching. The energy consumption could make speculative execution prohibitive, but it's not "the" reason we do it.

credit_guy · on June 2, 2024

Noob question here: is this the reason specialized chips can work so much better for AI applications? That the computations needed in a neural network are entirely deterministic and there is no need for branch prediction?

vasco · on June 2, 2024

Not really, it's more the massive parallelism. Branch prediction takes away something but mostly it's the parallelism. Each instruction you usually do in a GPU is a massive array in one go. In a cpu you need to use AVX type instructions and those are way more limited in the size of arrays they can process at once.

credit_guy · on June 2, 2024

Yes, the GPUs provide massive parallelism. An NVIDIA RTX 4090 has 16384 "cuda cores". Whatever these cuda cores are, they must be much, much smaller than a CPU core. They do computations though, and CPU cores do computations too. Why do the CPU cores need to be so much larger, so a CPU with more than 64 cores is rarely heard of, while GPUs have thousands of cores?

vasco · on June 3, 2024

Read about vector instructions a little bit and you'll see what I mean in the previous comment. A CPU has many many niche instructions it supports, it's way more flexible. A GPU is just trying to multiply the largest arrays possible as fast as possible, so the architecture becomes different. I don't think there's a quick way for you to grasp this without reading more about computer architecture and instruction sets, but you seem to be interested in it, so dive in :)