None of these combined shift-and-add instructions need a full barrel shifter, though, so they? Usually they’re selecting from 2-4 possible shift amounts, not 64 of them.
Arm64 has fast 128-bit loads. Not just with NEON, but with regular integer instructions, you can quickly load 128 bits into a pair of 64-bit registers.
So it kind of makes sense to support fast shift by four. Though, it's more likely they just profiled a bunch of code and decided fast shifts by four was worth budgeting for.