There's no JVM overhead like for Spark computation. The dask array methods use the numpy C-API, which are implemented in C and run on the physical machine.
I think you might have misunderstood my comment; I was referring to the bullet point under "What didn't work":
> Reduction speed: The computation of normalized temperature, z, took a surprisingly long time. I’d like to look into what is holding up that computation.