Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It’s unfortunate that this announcement is still unspecific about what they improved in the Neural Engine. Since all we know about the Neural Engine comes from Apple papers or reverse engineering efforts (https://github.com/hollance/neural-engine), it’s plausible that they addressed some quirks to enable better transformer performance. They have written quite interesting papers on transformers on the Neural Engine:

- https://machinelearning.apple.com/research/neural-engine-tra...

- https://machinelearning.apple.com/research/vision-transforme...

Things have definitely gotten better with MLX on the software side, though it still seems they could do more in that area (let’s see what the M5 Max brings). But even if they made big strides here, it won’t help previous generations, and the main thing limiting Apple Intelligence (in my opinion) will continue to be the 8 GB of unified memory they still insist on.



> the main thing limiting Apple Intelligence (in my opinion) will continue to be the 8 GB of unified memory they still insist on.

As you said - it won’t help previous generations, though since last year (or two??) all macs start with 16GB of memory. Even entry level macbook airs.


Thats true! I was referring to their wider line up, especially the iPad, where users will expect the same performance as on the Mac’s (they payed for an Mx chip) and they sold me an iPad Air this year that comes with a really fast M3 and still only 8 GB of RAM (you only get 16 on the iPad Pro btw if you go with at least 1TB of storage on the M4 Pro one)


Why would you expect the same performance on iPad and MacBook Pro?

The latter has up to 128GB of memory?


You probably wouldn’t with a Pro but you might between an iPad Pro and an MacBook Air. With the foundation models API they basically said that there will be one size of model for the entire platform, making smarter models on a MacBook Pro unrealistic and only faster ones possible.


Isn't Private Cloud Compute already enabling the more powerful models to be run on the server? That way the on-device models don't have as much pressure to be The One.


That’s fair


"They sold me"? You me you bought.


I bet Cook authorized the upgrade with grinned teeth and I was all for it


Faster compute helps, for things like vision language model that requires bigger context to be filled. My understanding is that ANE is still optimized for convolution load, and compute efficiency while the new neural accelerators optimized for flexibility and performance.


The old ANE enabled arbitrary statically scheduled multiply-add, of INT8 or FP16. That's good for convolution but not specifically geared for it.


I am not an expert on ANE, but I think it is related to the size of register files and how that is smaller than what we need for GEMM on modern transformers (especially these fat ones with MoE).


AIUI the ANE makes use of data in unified memory, not in the register file. So this wouldn't be an inherent limitation. (OTOH, that's why it wastes memory bandwidth for most newer transformer models, which use heavily quantized data - the ANE will have to read padded/unquantized values and the fraction of memory bandwidth that's used for that padding is pure waste.)


That would be an interesting approach if true. I hope someone gets to the bottom of it once we have hardware in our hands.


MLX doesn't use the neural engine still right? I still wish they would abandon that unit and just center everything around metal and tensor units on the GPU.


MLX is a training/research framework, and the work product is usually a CoreML model. A CoreML model will use any and all resources that are available to it, at least if the resource fits for the need.

The ANE is for very low power, very specific inference tasks. There is no universe where Apple abandons it, and it's super weird how much anti-ANE rhetoric there is on this site, as if there can only be one tool for an infinite selection of needs. The ANE is how your iPhone extracts every bit of text from images and subject matter information from photos with little fanfare or heat, or without destroying your battery, among many other uses. It is extremely useful for what it does.

>tensor units on the GPU

The M5 / A19 Pro are the first chips with so-called tensor units. e.g. matmul on the GPU. The ANE used to be the only tensor-like thing on the system, albeit as mentioned designed to be super efficient and for very specific purposes. That doesn't mean Apple is going to abandon the ANE, and instead they made it faster and more capable again.


> ...and it's super weird how much anti-ANE rhetoric there is on this site, as if there can only be one tool for an infinite selection of needs

That seems like a strange comment. I've remarked in this thread (and other threads on this site) about what's known re: low-level ANE capabilities, and it seems to have significant potential overall, even for some part of LLM processing. I'm not expecting it to be best-in-class at everything, though. Just like most other NPUs that are also showing up on recent laptop hardware.


> the work product is usually a CoreML model.

What work product? Who is running models on Apple hardware in prod?


An enormous number of people and products. I'm actually not sure if your comment is serious, because it seems to be of the "I don't, therefore no one does" variety.


Enormous compared to what? Do you have any numbers, or are you going off what your X/Bluesky feed is telling you?


I'm super not interested in arguing with the peanut gallery (meaning people who don't know the platform but feel that they have absolute knowledge of it), but enough people have apps with CoreML models in them, running across a billion or so devices. Some of those models were developed or migrated with MLX.

You don't have to believe this. I could not care less if you don't.

Have a great day.


I don't believe it. MLX is a proprietary model format and usually the last to get supported on Huggingface. Given that most iOS users aren't selecting their own models, I genuinely don't think your conjecture adds up. The majority of people are likely using safetensors and GGUF, not MLX.

If you had a source to cite then it would remove all doubt pretty quickly here. But your assumptions don't seem to align with how iOS users actually use their phone.


I didn't know the entire ML world is defined by what appears in HuggingFace


I never attributed the entire ML world to Huggingface. I am using it to illustrate a correlation.


Cite a source? That CoreML models are prolific on Apple platforms? That Apple devices are prolific? Search for it yourself.

You seem set on MLX and apparently on your narrow view of what models are. This discussion was about ANE vs "tensor" units on the GPU, and someone happened to mention MLX in that context. I clarified the role of MLX, but that from an inference perspective most deployments are CoreML, which will automatically use ANE if the model or some subset fits (which is actually fairly rare as it's a very limited -- albeit speedy and power efficient -- bit of hardware). These are basic facts.

>how iOS users actually use their phone.

What does this even mean? Do you think I mean people are running Qwen3-Embedding-4B in pytorch on their device or something? Loads of apps, including mobile games, have models in them now. This is not rare, and most users are blissfully unaware.


> That CoreML models are prolific on Apple platforms? That Apple devices are prolific?

correct and non-controversial

> An enormous number of people and products [use CoreML on Apple platforms]

non-sequitur

EDIT: i see people are not aware of

https://en.wikipedia.org/wiki/Simpson%27s_paradox


[flagged]


[flagged]


Can you share a example of apps you mean any maybe it would clear up any confusion?


Any iPhone or iPad app that does local ML inference?


Yes please tell us which apps those are


Wand, Polycam, smart comic reader, Photos of course. Those are just the ones on my phone, probably many more.


The keyboard. Or any of the features in Photos.app that do classification on-device.


Oh, I overlooked that! You are right. Surprising… since Apple has shown that it’s possible through CoreML (https://github.com/apple/ml-ane-transformers)

I would hope that the Foundation Models (https://developer.apple.com/documentation/foundationmodels) use the neural engine.


The neural engine not having a native programming model makes it effectively a dead end for external model development. It seems like a legacy unit that was designed for cnns with limited receptive fields, and just isn't programmable enough to be useful for the total set of models and their operators available today.


That's sadly true, over in x86 land things don't look much better in my opinion. The corresponding accelerators on modern Intel and AMD CPUs (the "Copilot PCs") are very difficult to program as well. I would love to read a blog post on someone trying though!


I have a lot of the details there. Suffice to say it's a nightmare:

https://www.google.com/url?sa=t&source=web&rct=j&opi=8997844...

AMD is likely to back away from this IP relatively soon.


Edit: Foundation Models use the Neural Engine. They are referring to a Neural Engine compatible K/V cache in this announcement: https://machinelearning.apple.com/research/introducing-apple...


Wrt. language models/transformers, the neural engine/NPU is still potentially useful for the pre-processing step, which is generally compute-limited. For token generation you need memory bandwidth so GPU compute with neural/tensor accelerators is preferable.


I think I'd still rather have the hardware area put into tensor cores for the GPU instead of this unit that's only programmable with onnx.


ofc true. Unified memory is always less than vram. And my 16GB vram aren't enough.

But I think it's also a huge issue Apple makes storage so expensive. If Apple wants local AI to answer your questions it should be able to take your calender, emails, text messages, photos, journal entries etc. into account. It can't do that as nicely as long as customers opt for only 256GB or 1TB devices due to cost


My guess is that they moved the systolic arrays inside the GPU cores just like how it's done in modern NVIDIA chips.

That's the only way to speed up MLX 4x compared to M4.


No matter how fast your Neural Engine is, it's not much help if you're constantly juggling memory just to run a model


I can only guess that significant changes in hardware have longer lead times than software (for example). I suppose I am not expecting anything game-changing until the M6.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: