Faster compute helps, for things like vision language model that requires bigger context to be filled. My understanding is that ANE is still optimized for convolution load, and compute efficiency while the new neural accelerators optimized for flexibility and performance.
I am not an expert on ANE, but I think it is related to the size of register files and how that is smaller than what we need for GEMM on modern transformers (especially these fat ones with MoE).
AIUI the ANE makes use of data in unified memory, not in the register file. So this wouldn't be an inherent limitation. (OTOH, that's why it wastes memory bandwidth for most newer transformer models, which use heavily quantized data - the ANE will have to read padded/unquantized values and the fraction of memory bandwidth that's used for that padding is pure waste.)