Hacker News new | past | comments | ask | show | jobs | submit login

I thought “unified memory” was just a marketing term for the memory being extremely close to the processor?



No, unified memory usually means the CPU and GPU (and miscellaneous things like the NPU) all use the same physical pool of RAM and moving data between them is essentially zero-cost. That's in contrast to the usual PC setup where the CPU has its own pool of RAM, which is unified with the iGPU if it has one, but the discrete GPU has its own independent pool of VRAM and moving data between the two pools is a relatively slow operation.

An RTX4090 or H100 has memory extremely close to the processor but I don't think you would call it unified memory.


I don't quite understand one of the finer points of this, under caffeinated :) - if GPU memory is extremely close to the CPU memory, what sort of memory would not be extremely close to the CPU?


I think you misunderstood what I meant by "processor", the memory on a discrete GPU is very close to the GPUs processor die, but very far away from the CPU. The GPU may be able to read and write its own memory at 1TB/sec but the CPU trying to read or write that same memory will be limited by the PCIe bus, which is glacially slow by comparison, usually somewhere around 16-32GB/sec.

A huge part of optimizing code for discrete GPUs is making sure that data is streamed into GPU memory before the GPU actually needs it, because pushing or pulling data over PCIe on-demand decimates performance.


> CPU trying to read or write that same memory will be limited by the PCIe bus, which is glacially slow by comparison, usually somewhere around 16-32GB/sec.

If you’re forking out for H100’s you’ll usually be putting them on a bus with much higher throughput, 200GB/s or more.


I see, TL;DR == none; and processor switches from {CPU,GPU} to {GPU} in the 2nd paragraph. Thanks!


I thought it meant that both the GPU and the CPU can access it. In most systems, GPU memory cannot be accessed by the CPU (without going through the GPU); and vice versa.


CPUs access GPU memory via MMIO (though usually only a small portion), and GPUs can in principle access main memory via DMA. Meaning, both can share an address space and access each other’s memory. However, that wouldn’t be called Unified Memory, because it’s still mediated by an external bus (PCIe) and thus relatively slower.


Are they cache coherent these days? I feel like any unified memories should be.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: