Nvidia Next-Gen GPU Teamed with ARM Hercules CPU

jojo9978 · on Dec 19, 2019

"Nvidia Tesla T4’s web page lists its inferencing capacity as 130 TOPS. On ResNet-50 it benchmarks at batch size = 28 as processing 3,920 images/second (image size = 224×224 pixels). We know that ResNet-50 requires 3.5 Billion MACs/image = 7 Billion Operations. Tesla T4 actually performs 3920 images/second x 7 Billion Operations/image = 27,440 Billion Operations/second = 27.4 Trillion Operations/Second = 27.4 TOPS. As a result, 130 TOPS is actually 27.4 TOPS of real throughput = <25% hardware utilization. What about batch =1? It’s likely much less."