The ranking is calculated based on the Linpack benchmark. Being a parallel appli...

wnissen · on June 22, 2020

Linpack is pretty lightweight as far as benchmarks go. You need some memory bandwidth but not much network at all, just reduces which are pretty efficient. It's not a good proxy for the most challenging applications, but lives on because no one has a better alternative. Basically, these are the classes of problems: 1. compute bound, trivially parallel. These are like breaking RSA encryption, stuff that was done over the Internet 20 years ago even when links were much slower. Basically doesn't even need the proverbial Beowulf cluster. Linpack is basically in this category, so you could, with care, make a cloud machine to do it. 2. Memory-bandwidth bound, trivially parallel. Stuff like search engine index building, Still not hard to do over distributed networks, or, yes, commercial Ethernet in a Beowulf Cluster. 3. Network bound, coupled parallel. The most challenging category, can only be done with a single-site computer on a fast interconnect. And, as noted, "fast" here has a totally different meaning compared to commercial networking latencies, especially. Depending on the type of network, you can have a significant fraction of the total transistors in the machine in the interconnect. These networks are heavily optimized for specific MPI operations, such as All-to-All, where you might have 1 million cores. The reason is that the whole calculation, being coupled, moves as quickly as the slowest task on the slowest node. You see weird stuff like reserving an entire core just for communicating with the I/O system and handling OS interrupts, because otherwise the "jitter" of nodes randomly doing stuff slows down the entire machine.

sushshshsh · on June 22, 2020

I am curious to learn a bit more about how supercomputer scores measure proportionally to "real world performance", which is a hard thing to quantify since there are probably hundreds of different application "types" in the "real world".

Combine this with the fact that many applications are limited by network throughput rather than by CPU/SSD/RAM/PCIE, and performance becomes a hard thing to quantify even in terms of "how many ARM cores do i need to buy to make my CPU not be the bottleneck"

There are benchmarks for ARM linux compilation and ARM openjdk performance benchmarks which are a good start, but I don't know how to compare SKUs between those ARMs and the ones found in top500 supercomputers.

mxcrossb · on June 22, 2020

HPCG is another benchmark on the Top500 site, and it’s more of a real world benchmark. It’s of course not perfect, but maybe that’s more what you’re looking for.