Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It does use Horner's rule, but splits the expression into two halves in order to exploit instruction-level parallelism.


Considering the form of both halves is the same, are compilers smart enough to vectorize this code?


I might be wrong but I would think for something like this vectorizing wouldn't save time (since you would have to move data around before and afterwards. The real benefit of this is it lets you run two fma operations in parallel.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: