Hacker News new | past | comments | ask | show | jobs | submit login

The ANE and tensor cores are not comparable though. One is literally meant for low cost inference while the others are meant for acceleration of training.

If you squint, yeah they look the same, but so does the microcontroller on the GPU and a full blown CPU. They’re fundamentally different purposes, architectures and scale of use.

The ANE can’t even really be used directly. Apple heavily restricts the use via CoreML APIs for inference. It’s only usable for smaller, lightweight models.

If you’re comparing to the tensor cores, you really need to compare against the GPU which is what gets used by apples ml frameworks such as MLX for training etc.

It will still be behind the NVIDIA GpU, but not by anywhere near the same numbers.




>The ANE and tensor cores are not comparable though

They're both built to do the most common computation in AI (both training and inference), which is multiply and accumulate of matrices - A * B + C. The ANE is far more limited because they decided to spend a lot less silicon space on it, focusing on low-power inference of quantized models. It is fantastically useful for a lot of on-device things like a lot of the photo features (e.g. subject detection, text extraction, etc).

And yes, you need to use CoreML to access it because it's so limited. In the future Apple will absolutely, with 100% certainty, make an ANE that is as flexible and powerful as tensor cores, and they force you through CoreML because it will automatically switch to using it (where now you submit a job to CoreML and for many it will opt to use the CPU/GPU instead, or a combination thereof. It's an elegant, forward thinking implementation). Their AI performance and credibility will greatly improve when they do.

>you really need to compare against the GPU

From a raw performance perspective, the ANE is capable of more matrix multiply/accumulates than the GPU is on Apple Silicon, it's just limited to types and contexts that make it unsuitable for training, or even for many inference tasks.


So now the TOPS are not comparable because M3 is much slower than an Nvidia GPU? That's not how comparisons work.

My numbers are correct, the M3 Ultra has around 1 % of the TOPS performance of a RTX 5090.

Comparing against the GPU would look even worse for apple. Do you think Apple added the neural engine just for fun? This is exactly what the neural engine is there for.


You’re completely missing the point. The ANE is not equivalent as a component to the tensor cores. It has nothing to do with comparison of TOPs but as what they’re intended for.

Try and use the ANE in the same way you would use the tensor cores. Hint: you can’t, because the hardware and software will actively block you.

They’re meant for fundamentally different use cases and power loads. Even apples own ML frameworks do not use the ANE for anything except inference.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: