MLX doesn't use the neural engine still right? I still wish they would abandon t...

llm_nerd · 2025-10-15T15:55:05 1760543705

MLX is a training/research framework, and the work product is usually a CoreML model. A CoreML model will use any and all resources that are available to it, at least if the resource fits for the need.

The ANE is for very low power, very specific inference tasks. There is no universe where Apple abandons it, and it's super weird how much anti-ANE rhetoric there is on this site, as if there can only be one tool for an infinite selection of needs. The ANE is how your iPhone extracts every bit of text from images and subject matter information from photos with little fanfare or heat, or without destroying your battery, among many other uses. It is extremely useful for what it does.

>tensor units on the GPU

The M5 / A19 Pro are the first chips with so-called tensor units. e.g. matmul on the GPU. The ANE used to be the only tensor-like thing on the system, albeit as mentioned designed to be super efficient and for very specific purposes. That doesn't mean Apple is going to abandon the ANE, and instead they made it faster and more capable again.

zozbot234 · 2025-10-15T16:37:14 1760546234

> ...and it's super weird how much anti-ANE rhetoric there is on this site, as if there can only be one tool for an infinite selection of needs

That seems like a strange comment. I've remarked in this thread (and other threads on this site) about what's known re: low-level ANE capabilities, and it seems to have significant potential overall, even for some part of LLM processing. I'm not expecting it to be best-in-class at everything, though. Just like most other NPUs that are also showing up on recent laptop hardware.

almostgotcaught · 2025-10-15T16:05:51 1760544351

> the work product is usually a CoreML model.

What work product? Who is running models on Apple hardware in prod?

llm_nerd · 2025-10-15T16:08:48 1760544528

An enormous number of people and products. I'm actually not sure if your comment is serious, because it seems to be of the "I don't, therefore no one does" variety.

bigyabai · 2025-10-15T16:12:23 1760544743

Enormous compared to what? Do you have any numbers, or are you going off what your X/Bluesky feed is telling you?

llm_nerd · 2025-10-15T16:24:20 1760545460

I'm super not interested in arguing with the peanut gallery (meaning people who don't know the platform but feel that they have absolute knowledge of it), but enough people have apps with CoreML models in them, running across a billion or so devices. Some of those models were developed or migrated with MLX.

You don't have to believe this. I could not care less if you don't.

Have a great day.

bigyabai · 2025-10-15T16:26:20 1760545580

I don't believe it. MLX is a proprietary model format and usually the last to get supported on Huggingface. Given that most iOS users aren't selecting their own models, I genuinely don't think your conjecture adds up. The majority of people are likely using safetensors and GGUF, not MLX.

If you had a source to cite then it would remove all doubt pretty quickly here. But your assumptions don't seem to align with how iOS users actually use their phone.

slashdave · 2025-10-16T14:38:49 1760625529

I didn't know the entire ML world is defined by what appears in HuggingFace

bigyabai · 2025-10-16T17:03:12 1760634192

I never attributed the entire ML world to Huggingface. I am using it to illustrate a correlation.

llm_nerd · 2025-10-15T16:40:42 1760546442

Cite a source? That CoreML models are prolific on Apple platforms? That Apple devices are prolific? Search for it yourself.

You seem set on MLX and apparently on your narrow view of what models are. This discussion was about ANE vs "tensor" units on the GPU, and someone happened to mention MLX in that context. I clarified the role of MLX, but that from an inference perspective most deployments are CoreML, which will automatically use ANE if the model or some subset fits (which is actually fairly rare as it's a very limited -- albeit speedy and power efficient -- bit of hardware). These are basic facts.

>how iOS users actually use their phone.

What does this even mean? Do you think I mean people are running Qwen3-Embedding-4B in pytorch on their device or something? Loads of apps, including mobile games, have models in them now. This is not rare, and most users are blissfully unaware.

kanaffa12345 · 2025-10-15T17:05:19 1760547919

> That CoreML models are prolific on Apple platforms? That Apple devices are prolific?

correct and non-controversial

> An enormous number of people and products [use CoreML on Apple platforms]

non-sequitur

EDIT: i see people are not aware of

https://en.wikipedia.org/wiki/Simpson%27s_paradox

kanaffa12345 · 2025-10-15T16:41:48 1760546508

[flagged]

llm_nerd · 2025-10-15T16:44:47 1760546687

[flagged]

koolala · 2025-10-15T17:25:54 1760549154

Can you share a example of apps you mean any maybe it would clear up any confusion?

tehnub · 2025-10-15T23:31:42 1760571102

Any iPhone or iPad app that does local ML inference?

almostgotcaught · 2025-10-16T01:59:48 1760579988

Yes please tell us which apps those are

tehnub · 2025-10-16T20:41:02 1760647262

Wand, Polycam, smart comic reader, Photos of course. Those are just the ones on my phone, probably many more.

klausa · 2025-10-16T07:20:41 1760599241

The keyboard. Or any of the features in Photos.app that do classification on-device.

hannesfur · 2025-10-15T14:55:20 1760540120

Oh, I overlooked that! You are right. Surprising… since Apple has shown that it’s possible through CoreML (https://github.com/apple/ml-ane-transformers)

I would hope that the Foundation Models (https://developer.apple.com/documentation/foundationmodels) use the neural engine.

fooblaster · 2025-10-15T15:34:20 1760542460

The neural engine not having a native programming model makes it effectively a dead end for external model development. It seems like a legacy unit that was designed for cnns with limited receptive fields, and just isn't programmable enough to be useful for the total set of models and their operators available today.

hannesfur · 2025-10-15T16:23:22 1760545402

That's sadly true, over in x86 land things don't look much better in my opinion. The corresponding accelerators on modern Intel and AMD CPUs (the "Copilot PCs") are very difficult to program as well. I would love to read a blog post on someone trying though!

fooblaster · 2025-10-15T16:58:21 1760547501

I have a lot of the details there. Suffice to say it's a nightmare:

https://www.google.com/url?sa=t&source=web&rct=j&opi=8997844...

AMD is likely to back away from this IP relatively soon.

hannesfur · 2025-10-15T15:14:11 1760541251

Edit: Foundation Models use the Neural Engine. They are referring to a Neural Engine compatible K/V cache in this announcement: https://machinelearning.apple.com/research/introducing-apple...

zozbot234 · 2025-10-15T14:27:54 1760538474

Wrt. language models/transformers, the neural engine/NPU is still potentially useful for the pre-processing step, which is generally compute-limited. For token generation you need memory bandwidth so GPU compute with neural/tensor accelerators is preferable.

fooblaster · 2025-10-15T14:30:40 1760538640

I think I'd still rather have the hardware area put into tensor cores for the GPU instead of this unit that's only programmable with onnx.