See this slide deck about it: https://developer.download.nvidia.com/CUDA/training/StreamsA...
(for the implementation in Fermi, it's even more flexible since then)
See this slide deck about it: https://developer.download.nvidia.com/CUDA/training/StreamsA...
(for the implementation in Fermi, it's even more flexible since then)