Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> By contrast for a more typical CPU, there's a compiler whose assembly output you can examine, and there's a processor manual that gives the cycle timing of each instruction

1. You can dump the SASS that corresponds to PTX: `cuobjdump --dump-sass <input_file>`

2. Getting the cycle count of a single instruction for an OOO architecture is completely meaningless because you have no idea when the instruction will actually be issued. This is true for both AMDGPU and NV.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: