A number of the cool string processing SIMD techniques depend a _lot_ on register widths and instruction performance characteristics. There’s a fair argument to be made that x64 could be made more consistent/legible for these use cases, but this isn’t matmul—whether you have 128, 256, or 512 bits matters hugely and you may want entirely different algorithms that are contingent on this.