Can any of the HPC experts shed some light on how these ARM chips are better than their predecessors. I toured a small cluster in LANL, where the ARM chips ran the hottest and their cooling was the loudest.
A64fx have on board HMB -> that means no dram. If you look at the fugaku mother board their is no Dimm slots. All the memory is on the same package as the CPU.
This delivers a huge boost in bandwith.
HMB stand for high memory bandwidth. It offers up to 900 GB/s.
Now if you add the tofu interconnect on top you have a systems finely tuned for maximising data movement.
Remember : compute is cheap, communication is expensive.
You can have load of gpu and processors but if you can't feed them data fast enough they are useless.
That is a pretty fun architecture. I hope that opens the door to higher performance for more workloads than top500.
At least with the top500 benchmark, the bandwidth is not a problem, so long as you can do a large enough problem. Since it is a linear solve that spends all its time doing matmul (n^3 operations on n^2 data), so long as the problem is big enough, you can saturate the cores.
That's fascinating. I know that AMD has been touting HBM as a faster memory subsystem for their GPUs, is that the same as HMB where it's stacked? Or are they just calling it something similar?
I'm not sure what predecessors that means (ThunderX2?), but these have been carefully "co-designed" for the job with experience from K Computer. Actually that's for a set of job types, which is part of the point. They also have extensive capability for low precision, if you want that. Note that it's not just at the top of top500, which is relatively uninteresting, but wins, or is up there, on things like HPCG, some sort of machine-learning benchmark, etc. K Computer also came out well generally, and persistently.
TX2 is from a different company though (Cavium -> Marvell). I guess the "predecessor" of the A64FX would technically be some SPARC chip that Fujitsu used to build?