Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you have a lot of "data plane" code or other looping over data, you can see a big gain from -O3 because of more aggressive unrolling and vectorization (HPC people use -O3 quite a lot). CRUD-like applications and other things that are branchy and heavy on control flow will often see a mild performance regression from use of -O3 compared to -O2 because of more frequent frequency hits due to AVX instructions and larger binary size.


I made a program with some inline assembly and tried O3 with clang once. Because the assembly was in a loop, the compiler probably didn't have enough information on the actual code and decided to fully unroll all 16 iterations, making performance drop by 25% because the cache locality was completely destroyed. What I'm trying to say, is that loop unrolling is definitely not a guarantee for faster code in exchange for binary size


Large blocks of inline assembly also destroy -O3. The compiler treats the asm statement as being essentially empty and makes decisions around it. Most inline asm is 1 instruction, so this is usually safe.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: