ARM NEON does have sum, min, and max reductions (and/or reductions can just be min/max if all bits in elements are the same), along with pairwise ops. RVV has sum,min,max,and,or,xor reductions. x86 has psadbw which sums windows of eight 8-bit ints, and various instructions for some pairwise horizontal stuff.
But in general building code around reductions isn't really a thing you'd ideally do; they're necessarily gonna have higher latency / lower throughput / take more silicon compared to avoiding them where possible, best to leave reducing to a single element to the loop tail.
But in general building code around reductions isn't really a thing you'd ideally do; they're necessarily gonna have higher latency / lower throughput / take more silicon compared to avoiding them where possible, best to leave reducing to a single element to the loop tail.