I was about to comment that Gluten is only targeting CPU vectorization, but then...

I was about to comment that Gluten is only targeting CPU vectorization, but then I found this (very cool!): https://github.com/apache/incubator-gluten/issues/9098

I'm not very familiar with Gluten, but I'll still comment on the CPU side though, assuming that one of Gluten's goals is to use the full vector processing (SIMD) potential of the CPU. In that case, we'd still be memory(-bandwidth)-bound, not to mention the significantly lower FLOPs of the CPU itself. If we vectorize Spark (or any MPP) for efficient compute, perhaps we should run it on hardware optimized for vectorized, super-parallel, high-throughput compute.

Also, there's nothing which says we can't use Gluten to have even more CPU+GPU utilization!