It seems they’re being very careful not to undercut their enterprise offerings or even the 4090. Assuming they’re not completely tone deaf, I can only assume this is the explanation.
5FP32 TFLOPs, if not doing sparse low precision inference it seems to be about in line with mid-high end 2014 Nvidia consumer card performance (gtx 980), one decade old.
For running sparsified/quantized llama2 it might be good, not sure about for fine tuning. I didn't see any FP16 numbers.
Per chip? Not the full story when discussing a system which can integrate multiple. The Orin has more memory bandwidth than an RTX 4050 even though the latter uses GDDR6. The M3 Max has double the bandwidth of the Orin, but also uses LPDDR5.