Can any of the HPC experts shed some light on how these ARM chips are better tha...

blopeur · on June 22, 2020

A64fx have on board HMB -> that means no dram. If you look at the fugaku mother board their is no Dimm slots. All the memory is on the same package as the CPU.

This delivers a huge boost in bandwith.

HMB stand for high memory bandwidth. It offers up to 900 GB/s.

Now if you add the tofu interconnect on top you have a systems finely tuned for maximising data movement.

Remember : compute is cheap, communication is expensive.

You can have load of gpu and processors but if you can't feed them data fast enough they are useless.

blopeur · on June 22, 2020

For a more in depth dive :

"Report on the Fujitsu Fugaku (富岳) System", Jack Dongarra, Jun 22, 2020

https://www.dropbox.com/s/aqntdb43p6so0z5/fugaku-report.pdf?...

ViralBShah · on June 22, 2020

That is a pretty fun architecture. I hope that opens the door to higher performance for more workloads than top500.

At least with the top500 benchmark, the bandwidth is not a problem, so long as you can do a large enough problem. Since it is a linear solve that spends all its time doing matmul (n^3 operations on n^2 data), so long as the problem is big enough, you can saturate the cores.

lukevp · on June 22, 2020

That's fascinating. I know that AMD has been touting HBM as a faster memory subsystem for their GPUs, is that the same as HMB where it's stacked? Or are they just calling it something similar?

rrss · on June 22, 2020

It's the same thing. High end GPUs have been using HBM2 for a few years, A64fx uses HBM2 for CPU memory.

floatboth · on June 22, 2020

"HMB" is a typo, it is the same HBM.

And yeah.. honestly I'm surprised AMD didn't do the HBM-CPU thing first. An HBM2-powered EPYC would've been an amazing product.

gnufx · on June 22, 2020

I'm not sure what predecessors that means (ThunderX2?), but these have been carefully "co-designed" for the job with experience from K Computer. Actually that's for a set of job types, which is part of the point. They also have extensive capability for low precision, if you want that. Note that it's not just at the top of top500, which is relatively uninteresting, but wins, or is up there, on things like HPCG, some sort of machine-learning benchmark, etc. K Computer also came out well generally, and persistently.

floatboth · on June 22, 2020

TX2 is from a different company though (Cavium -> Marvell). I guess the "predecessor" of the A64FX would technically be some SPARC chip that Fujitsu used to build?

gnufx · on June 22, 2020

The Hot Chips piece mentioned has comparisons with both versions of SPARCfx as far as I remember.