That's a pretty interesting idea, might take a bit to prepare a useful graphic/p...

That's a pretty interesting idea, might take a bit to prepare a useful graphic/post.

Also, what do you think would be the best way to structure such a post?

But, here's a small bit of something perf-y: during large shuffles, I was able to increase overall job performance/efficiency by using external shuffles, even with times of ~5s median shuffle write for a couple hundred MB partitions (I hope I'm remembering this correctly, lol). This is not particularly great, but it did allow for cost-efficiently chewing through some rather large datasets without dealing with memory issues. There's also an awesome side benefit in that it allows us to use cheap spot workers in more scenarios.