I still probably wouldn't be able to use it because I need a bunch of custom functionality for my optimizers (like for example custom quantization support and incremental gradient accumulation directly in optimizers' state), but I might borrow some of their techniques if they make things even faster.