Hacker News new | past | comments | ask | show | jobs | submit login

Fascinating. I wonder why intel never released “AVX256”. Was it to drive adoption of their new extra wide SIMD hardware? Do the extra instructions add a lot of complexity outside of just the increased register size?

Either way I recently had to write a SIMD implementation in both SSE (Intel) and Neon (Arm). This was my first time writing SIMD. I found the neon instruction set much more intuitive and complete than SSE. There are all these weird limitations in SSE (such as trying to do a reduction sum across a vector or shifting across vectors) that made it feel incomplete. Never had a chance to try out AVX.




That was around the time when Intel was struggling to get 10nm out of the door, when they were designing AVX512 it might have seemed that the initial implementation on 14nm was just a stop-gap and it would be more practical to implement such wide units on the next process that's just around the corner, but little did they know they would end up running in circles rehashing Skylake/14nm for the next six years while they waited for 10nm to finally come online. If they had known things would go that way, perhaps they would have done "AVX256".


>10nm

aka 10nm Enhanced SuperFin aka Intel 7 (12000 and 13000 series)

Which funnily enough don't support AVX512, unlike the previous 10000 (14nm++) and 11000 (14nm+++) series.


That's a whole other mess, the 12th and 13th gen P-core design does technically support AVX512 but the smaller E-cores don't, and rather than try to reconcile that mismatch in software they just disabled AVX512 altogether to make the cores all behave the same. If they hadn't decided to implement E-cores then 12th/13th gen would have had AVX512 support.

Some motherboards allowed you to enable AVX512 on those chips if you disabled the E-cores, but then Intel started permanently fusing off AVX512 in hardware on later batches.


Indeed, and to add on to this, I suspect that they had their own plans about where to take the client space in general, which involved more avx512 (which a couple of generations prior to alderlake supported!), but depended on their having a near-monopoly. The competitiveness of zen forced them to pivot, and avx512 fell to the wayside, largely because of its lack of users (which in turn is in large part because no one really cares about performance on clients).


Intel kinda messed up the ISA by requiring AVX-512F across all AVX-512 subsets. If AVX-512VL didn't depend on F, you could have a 256-bit only variant of "AVX-512".

(and instead of making a new feature that allows VL without F, Intel's "solution" seems to be about piecemeal backporting EVEX instructions to VEX (e.g. VNNI, IFMA))

NEON is generally a more "complete" SIMD ISA than SSE/AVX, though it has less "fancy" stuff. AVX-512 fills in a bunch of gaps that was missing in earlier ISAs, but still has odd omissions (like no 8-bit bitwise shift).


Cool that you started with SIMD :) FYI github.com/google/highway allows you to write your code once and target many instruction sets. It also fills in many of the gaps, including reductions.

Disclosure: I am the main author.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: