Conclusion is at the bottom, but TLDR was TPUs were 33% cheaper (performance per dollar) and JAX scales very well compared to PyTorch.
If you are curious, there was a thorough comparison done by Cohere and they published their paper https://arxiv.org/pdf/2309.07181 -- TPU+JAX turned out to be more performant and more fault tolerant (less weird errors).
Conclusion is at the bottom, but TLDR was TPUs were 33% cheaper (performance per dollar) and JAX scales very well compared to PyTorch.
If you are curious, there was a thorough comparison done by Cohere and they published their paper https://arxiv.org/pdf/2309.07181 -- TPU+JAX turned out to be more performant and more fault tolerant (less weird errors).