A part that seems interesting to me is the "unified memory" architecture
At first it seems like just a cost/power/size saving measure, like not having dedicated RAM in a discrete GPU
But the CPU-vs-GPU RAM distinction means there's a cost to moving work to the GPU as data has to be copied, leading to it only be used in cases where you can either do the whole work purely on the GPU or queue up batch jobs of work so that the cost of copying data can be amortised over the faster parallel GPU
They sort of hinted at it in the presentation, but the unified memory architecture potentially allows more flexible distribution of tasks between CPU and GPU and "Neural" cores, because the data doesn't have to be copied.
I wonder if this is then potentially something compilers will be able to do take more advantage of, automatically.
It will be interesting to see how useful the "Neural" cores are for non-ML workloads.
At first it seems like just a cost/power/size saving measure, like not having dedicated RAM in a discrete GPU
But the CPU-vs-GPU RAM distinction means there's a cost to moving work to the GPU as data has to be copied, leading to it only be used in cases where you can either do the whole work purely on the GPU or queue up batch jobs of work so that the cost of copying data can be amortised over the faster parallel GPU
They sort of hinted at it in the presentation, but the unified memory architecture potentially allows more flexible distribution of tasks between CPU and GPU and "Neural" cores, because the data doesn't have to be copied.
I wonder if this is then potentially something compilers will be able to do take more advantage of, automatically.
It will be interesting to see how useful the "Neural" cores are for non-ML workloads.