> I’d be curious how the unified memory architecture shifts the cost dynamic for...

klelatti · on Dec 13, 2020

I don't think it is different. For example, the OpenCL specification allows for the possibility that data doesn't need to be copied between CPU and GPU.

In a recent interview (I think with the Changelog podcast) I heard an Apple engineer explain that the M1 had an advantage over previous systems in that not only did the data not need to be copied (which implies this isn't new) but also that no changes to the format of the data were needed given Apple's end to end control.

gigatexal · on Dec 13, 2020

Yeah. It’s a real feat that Apple was able to get heterogenous computing done (something AMD was touting with OpenCL). The not having to copy data from system ram to GPU buffers etc is really great.

wmf · on Dec 13, 2020

That's probably the difference: AMD and Intel implemented zero-copy years ago but no software used it while the Metal stack on macOS probably does take advantage.

giantrobot · on Dec 13, 2020

One difference (as I understand it) is on Intel's integrated graphics the RAM used for the GPU is a dedicated segment for the GPU's use. You still need to copy data from the CPU's segment to the GPU's segment. While that might be faster than copying over PCIe it's still a copy operation. With the M1's GPU there's no segmentation so no copying.

That's how I understand it works but I might be completely wrong.

klelatti · on Dec 13, 2020

I'm not sure this is right e.g.

> Shared Physical Memory: The host and the device share the same physical DRAM. This is different from shared virtual memory, when the host and device share the same virtual addresses, and is not the subject of this paper. The key hardware feature that enables zero copy is the fact that the CPU and GPU have shared physical memory. Shared physical and shared virtual memories are not mutually exclusive.

From:

https://software.intel.com/content/www/us/en/develop/article...

uluyol · on Dec 13, 2020

Good point. Some Intel chips have had an on-package on-die?) 128MB "L4 cache" made of DRAM. That certainly sounds a lot like the M1's integrated memory.

addaon · on Dec 13, 2020

On-package, but not on-die. The GT[34]e processors used an external die with 64MB (GT3e) or 128MB (GT4e) of eDRAM.