yes, if you want to optimize a worst-case MPI-cluster, then a Pi (4) might be optimal for you (because sadly, 4 measly ARM cores with 100MBit/s is a some magnitudes removed from 100 cores and 100GBit/s Infiniband). But then you can also use a stack of old desktops, which is cheaper and you can just throw in a standard image and everything (including CUDA and MKL) can work.