Notice that your program could use a fold [1]. Not sure it would be much more ef...

jacobolus · on May 13, 2017

This is silly. You should (almost) never optimize low-level numerical building-block routines for code style at the expense of performance.

The proper thing to do is make a generic library function for evaluating Chebyshev polynomials, and then pass in the polynomial coefficients and the values to evaluate in arrays. That’s a much more “functional” approach than hard-coding the coefficients could ever be.

Then if you want it to go fast, you can vectorize the code and use FMA instructions, and run it on multiple CPU cores, as in https://github.com/alkashani/numcrunch/blob/master/src/clens... (this is code my wife wrote for me; I think there’s still a factor of 2 or 4 being left on the table getting data flowing efficiently through the CPU caches, but we didn’t really deeply profile it to figure out what the issue was). Of course you can do even better by running this type of highly parallelizable code on a GPU.

phkahler · on May 13, 2017

>> This is silly. You should (almost) never optimize low-level numerical building-block routines for code style at the expense of performance. >> The proper thing to do is make a generic library function for evaluating Chebyshev polynomials, and then pass in the polynomial coefficients and the values to evaluate in arrays.

It's funny that you express concern for performance and then advocate creating a generic function and passing everything as parameters. For things like the function in question, doing straight line code is usually the best way. Optimization should consist of what the author did - find the simplest mathematical expressions to calculate the answer and just write them down.

On the other hand, if for some reason a program is really doing this calculation over and over your approach (worrying about data flow and parallelism) may become preferable. I just haven't seen a case where sin(x) is used that much ;-)

jacobolus · on May 13, 2017

I thought the OP’s code was fine. My main point was that the top-level poster’s argument about what counted as “functional” seemed silly, and that if he really wanted to go all the way he could write a more completely generic function which could also be made faster by getting data flowing through the CPU more efficiently or using a GPU. (I shouldn’t have phrased this as “the proper thing to do” without clarification, but I can’t edit it now.)

> if for some reason a program is really doing this calculation over and over

Most of the time when you want to evaluate sin(x) for one number x, you want to also evaluate it for a million other numbers. For example, you might be updating the velocity vectors of all the rockets in a space game, finding the map coordinates of a polyline representing the coastline of England, converting color space coordinates for every pixel in an image from CIELAB to CIELCh, or whatever.

In cases where you are just calling the sine function as a one-off, performance is pretty well irrelevant.

* * *

Since sin(x) has a few steps before and after the polynomial evaluation, you are right that it usually makes sense to hard-code the whole function with domain normalization etc. (ideally still vectorized), though those other steps could also be genericized if necessary, since you could also use them for various other special functions.

The nice thing about just storing the polynomial coefficients as data is that you can then do more things with them than evaluation alone. For inspiration, look at the Chebfun project, http://www.chebfun.org/ https://people.maths.ox.ac.uk/trefethen/cacm.pdf

* * *

One last thing: I recommend trying to re-frame trigonometry problems as vector problems whenever possible, and thereby not needing the sine function at all. Code which avoids transcendental function evaluations altogether is usually simpler to reason about and faster.

phkahler · on May 13, 2017

>> One last thing: I recommend trying to re-frame trigonometry problems as vector problems whenever possible, and thereby not needing the sine function at all. Code which avoids transcendental function evaluations altogether is usually simpler to reason about and faster.

I couldn't agree with that more!

fsloth · on May 13, 2017

Your comment is beyond silly. The whole point of numerical code is to perform predictably and within the performance constraints of the specific problem. Style purism is the worst kind of elitism - it's braggadocio for it's own sake without real world merit. Every solution to a problem should be evaluated in relation to the problem domain and constraints given. Sometimes the more understandable version is better. Sometimes the functional version is better. But there is no stencil based development methodology that would provide the best answer every time. And, yes, I like code too that does not have side-effects and reads more like math. That's not always the best code to solve a problem though (unlike in problem specification where striving for something like TLA+ like presentation usually has it's merits).

jlg23 · on May 13, 2017

> Not sure it would be much more efficient

It won't - and I don't have to measure to know.

* OP uses standard optimization techniques. Their use has been justified over decades of actual research.

* OP uses a compiled language. OP can look at disassembly and see what's happening on the CPU.

* Your use of +reduce+ does not make sense at all - it's just syntactic sugar (you of course can use HOFs to handle constants, but that's completely missing the idea of function composition, which arguably is the only reason +reduce+ exists).

raiph · on May 13, 2017

I hear your basic points, and this is a tangent on a tangent, but:

Rakudo, the existing implementation of standard Perl 6, is a compiler.

(I wouldn't be surprised if the code currently generated by Rakudo for this algorithm is far slower than the code generated by rustc but that isn't because it isn't a compiler.)

brianon99 · on May 13, 2017

Loop unrolling as shown by the author is quite common in maths functions. It is still quite readable and ensure good performance. Openblas use a similar tricks in the old days when there is no intrinsics. Not sure for now.

ruricolist · on May 13, 2017

Doesn't Perl 6 have macros? If you have macros, you can do it both elegantly and efficiently by expanding Horner's rule at compile time. You can still use reduce; just have it return the form unevaluated.

In fact, because the coefficients are constant at compile time, you could precondition the polynomial and do even better than Horner's rule, with only three multiplications.

grondilu · on May 13, 2017

Perl 6 does have macros but they are not quite mature and that would probably be overkill for a six coefficients polynomial. Reduce should be pretty fast anyway.

kazinator · on May 13, 2017

The elegant solution in Perl6 is to just call the library function: sin($x).

No multiple lines of code, no reduce, nothing.

The point of this is to get something faster.