Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> I’d be curious how the unified memory architecture shifts the cost dynamic for GPU acceleration.

Correct me if I'm wrong, but is this actually different from regular integrated graphics that have been in intel and amd chips for decades? I remember there being some initiatives from amd proposing similar offloading under the name HSA almost a decade ago. I don't think there are actually any software really using it.




I don't think it is different. For example, the OpenCL specification allows for the possibility that data doesn't need to be copied between CPU and GPU.

In a recent interview (I think with the Changelog podcast) I heard an Apple engineer explain that the M1 had an advantage over previous systems in that not only did the data not need to be copied (which implies this isn't new) but also that no changes to the format of the data were needed given Apple's end to end control.


Yeah. It’s a real feat that Apple was able to get heterogenous computing done (something AMD was touting with OpenCL). The not having to copy data from system ram to GPU buffers etc is really great.


That's probably the difference: AMD and Intel implemented zero-copy years ago but no software used it while the Metal stack on macOS probably does take advantage.


One difference (as I understand it) is on Intel's integrated graphics the RAM used for the GPU is a dedicated segment for the GPU's use. You still need to copy data from the CPU's segment to the GPU's segment. While that might be faster than copying over PCIe it's still a copy operation. With the M1's GPU there's no segmentation so no copying.

That's how I understand it works but I might be completely wrong.


I'm not sure this is right e.g.

> Shared Physical Memory: The host and the device share the same physical DRAM. This is different from shared virtual memory, when the host and device share the same virtual addresses, and is not the subject of this paper. The key hardware feature that enables zero copy is the fact that the CPU and GPU have shared physical memory. Shared physical and shared virtual memories are not mutually exclusive.

From:

https://software.intel.com/content/www/us/en/develop/article...


Good point. Some Intel chips have had an on-package on-die?) 128MB "L4 cache" made of DRAM. That certainly sounds a lot like the M1's integrated memory.


On-package, but not on-die. The GT[34]e processors used an external die with 64MB (GT3e) or 128MB (GT4e) of eDRAM.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: