Hacker News new | past | comments | ask | show | jobs | submit login

It's still boiling down to hardware and software differences.

In terms of hardware - Apple designs their GPUs for GPU workloads, whereas Nvidia has a decades-old lead on optimizing for general-purpose compute. They've gotten really good at pipelining and keeping their raster performance competitive while also accelerating AI and ML. Meanwhile, Apple is directing most of their performance to just the raster stuff. They could pivot to an Nvidia-style design, but that would be pretty unprecedented (even if a seemingly correct decision).

And then there's CUDA. It's not really appropriate to compare it to Metal, both in feature scope and ease of use. CUDA has expansive support for AI/ML primatives and deeply integrated tensor/SM compute. Metal does boast some compute features, but you're expected to write most of the support yourself in the form of compute shaders. This is a pretty radical departure from the pre-rolled, almost "cargo cult" CUDA mentality.

The Linux shtick matters a tiny bit, but it's mostly a matter of convenience. If Apple hardware started getting competitive, there would be people considering the hardware regardless of the OS it runs.




> keeping their raster performance competitive while also accelerating AI and ML. Meanwhile, Apple is directing most of their performance to just the raster stuff. They could pivot to an Nvidia-style design, but that would be pretty unprecedented (even if a seemingly correct decision).

Isn't Apple also focusing on the AI stuff? How has it not already made that decision? What would prevent Apple from making that decision?

> Metal does boast some compute features, but you're expected to write most of the support yourself in the form of compute shaders. This is a pretty radical departure from the pre-rolled, almost "cargo cult" CUDA mentality.

Can you give an example of where Metal wants you to write something yourself whereas CUDA is pre-rolled?


> Isn't Apple also focusing on the AI stuff?

Yes, but not with their GPU architecture. Apple's big bet was on low-power NPU hardware, assuming the compute cost of inference would go down as the field progressed. This was the wrong bet - LLMs and other AIs have scaled up better than they scaled down.

> How has it not already made that decision? What would prevent Apple from making that decision?

I mean, for one, Apple is famously stubborn. They're the last ones to admit they're wrong whenever they make a mistake, presumably admitting that the NPU is wasted silicon would be a mea-culpa for their AI stance. It's also easier to wait for a new generation of Apple Silicon to overhaul the architecture, rather than driving a generational split as soon as the problem is identified.

As for what's preventing them, I don't think there's anything insurmountable. But logically it might not make sense to adopt Nvidia's strategy even if it's better. Apple can't neccessarily block Nvidia from buying the same nodes they get from TSMC, so they'd have to out-design Nvidia if they wanted to compete on their merits. Even then, since Apple doesn't support OpenCL it's not guaranteed that they would replace CUDA. It would just be another proprietary runtime for vendors to choose from.

> Can you give an example of where Metal wants you to write something yourself whereas CUDA is pre-rolled?

Not exhaustively, no. Some of them are performance-optimized kernels like cuSPARSE, some others are primative sets like cuDNN, others yet are graph and signal processing libraries with built-out support for industrial applications.

To Apple's credit, they've definitely started hardware-accelerating the important stuff like FFT and ray tracing. But Nvidia still has a decade of lead time that Apple spent shopping around with AMD for other solutions. The head-start CUDA has is so great that I don't think Apple can seriously respond unless the executives light a fire under their ass to make some changes. It will be an "immovable rock versus an unstoppable force" decision for Apple's board of directors.


I think betting on low-power NPU hardware wasn't necessarily wrong - if you're Apple you're trying to optimise performance/watt across the system as a whole. So in a context where you're shipping first-party bespoke on-device ML features it can make sense to have a modestly sized dedicated accelerator.

I'd say the biggest problem with the NPU is that you can only use it from Core ML. Even MLX can't access it it!

As you say the big world-changing LLMs are scaling up, not down. At the same time (at least so far) LLM usage is intermittent - we want to consume thousands of tokens in seconds, but a couple of times a minute. That's a client-server timesharing model for as long as the compute and memory demand can't fit on a laptop.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: