Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I did some experiments with adding Metal Performance Shaders support, but the performance that I achieved was only marginally better compared to the one I get when using just the Accelerate framework (there is an unmerged PR with the tests).

Honestly, I am bit confused with all the different types of processing units available on Apple Silicon. If I understand correctly, we have: CPU, GPU, AMX coprocessor and Neural Engine on a single chip. I don't fully understand how these interact with each other. Can we use them all at the same time, or would there be some penalties? I'm interested in finding some resources/information on the topic.



You are correct, in that those are the four

My understanding is that the AMX is more tightly wound with the CPU, ultimately being accessible via an instruction set (https://github.com/corsix/amx), and it is useful if you need to do matrix multiplications interleaved with other CPU tasks. A common example would be a VIO loop or something where you want that data in the CPU caches.

The GPU and Neural Engine are not that – they take some time to set up and initialize. They also can parallelize tasks to a much higher degree. The GPU is more generalizable, because you can write compute shaders to do anything in parallel, but it uses a lot of resources. I'll have to check out the PR to see how exactly the MPS implementation matches up with the task at hand, because you could also consider writing Metal compute shaders by hand. Even if the performance is not much better, the CPU is free to do other things.

I know the least about the ANE, but it has specific hardware for running ML models, and you have to process the weights ahead of time to make sure they are in the right format. It can run ML models very efficiently and is the most battery friendly.


I suggest looking into Halide as it will make trying different paths much easier (https://halide-lang.org/).

I haven't looked at your code closely so can't say with certainty it would be the right fit but worth a look.


OpenCL would be appreciated much... Opens the door to use this on many more low powered devices but, it could be very difficult as you have already mentioned.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: