Notice that your program could use a fold [1]. Not sure it would be much more efficient, but that would give it a more functional-programming style, which is arguably more elegant.
This is silly. You should (almost) never optimize low-level numerical building-block routines for code style at the expense of performance.
The proper thing to do is make a generic library function for evaluating Chebyshev polynomials, and then pass in the polynomial coefficients and the values to evaluate in arrays. That’s a much more “functional” approach than hard-coding the coefficients could ever be.
Then if you want it to go fast, you can vectorize the code and use FMA instructions, and run it on multiple CPU cores, as in https://github.com/alkashani/numcrunch/blob/master/src/clens... (this is code my wife wrote for me; I think there’s still a factor of 2 or 4 being left on the table getting data flowing efficiently through the CPU caches, but we didn’t really deeply profile it to figure out what the issue was). Of course you can do even better by running this type of highly parallelizable code on a GPU.
>> This is silly. You should (almost) never optimize low-level numerical building-block routines for code style at the expense of performance.
>> The proper thing to do is make a generic library function for evaluating Chebyshev polynomials, and then pass in the polynomial coefficients and the values to evaluate in arrays.
It's funny that you express concern for performance and then advocate creating a generic function and passing everything as parameters. For things like the function in question, doing straight line code is usually the best way. Optimization should consist of what the author did - find the simplest mathematical expressions to calculate the answer and just write them down.
On the other hand, if for some reason a program is really doing this calculation over and over your approach (worrying about data flow and parallelism) may become preferable. I just haven't seen a case where sin(x) is used that much ;-)
I thought the OP’s code was fine. My main point was that the top-level poster’s argument about what counted as “functional” seemed silly, and that if he really wanted to go all the way he could write a more completely generic function which could also be made faster by getting data flowing through the CPU more efficiently or using a GPU. (I shouldn’t have phrased this as “the proper thing to do” without clarification, but I can’t edit it now.)
> if for some reason a program is really doing this calculation over and over
Most of the time when you want to evaluate sin(x) for one number x, you want to also evaluate it for a million other numbers. For example, you might be updating the velocity vectors of all the rockets in a space game, finding the map coordinates of a polyline representing the coastline of England, converting color space coordinates for every pixel in an image from CIELAB to CIELCh, or whatever.
In cases where you are just calling the sine function as a one-off, performance is pretty well irrelevant.
* * *
Since sin(x) has a few steps before and after the polynomial evaluation, you are right that it usually makes sense to hard-code the whole function with domain normalization etc. (ideally still vectorized), though those other steps could also be genericized if necessary, since you could also use them for various other special functions.
One last thing: I recommend trying to re-frame trigonometry problems as vector problems whenever possible, and thereby not needing the sine function at all. Code which avoids transcendental function evaluations altogether is usually simpler to reason about and faster.
>> One last thing: I recommend trying to re-frame trigonometry problems as vector problems whenever possible, and thereby not needing the sine function at all. Code which avoids transcendental function evaluations altogether is usually simpler to reason about and faster.
Your comment is beyond silly. The whole point of numerical code is to perform predictably and within the performance constraints of the specific problem. Style purism is the worst kind of elitism - it's braggadocio for it's own sake without real world merit. Every solution to a problem should be evaluated in relation to the problem domain and constraints given. Sometimes the more understandable version is better. Sometimes the functional version is better. But there is no stencil based development methodology that would provide the best answer every time. And, yes, I like code too that does not have side-effects and reads more like math. That's not always the best code to solve a problem though (unlike in problem specification where striving for something like TLA+ like presentation usually has it's merits).
* OP uses standard optimization techniques. Their use has been justified over decades of actual research.
* OP uses a compiled language. OP can look at disassembly and see what's happening on the CPU.
* Your use of +reduce+ does not make sense at all - it's just syntactic sugar (you of course can use HOFs to handle constants, but that's completely missing the idea of function composition, which arguably is the only reason +reduce+ exists).
I hear your basic points, and this is a tangent on a tangent, but:
Rakudo, the existing implementation of standard Perl 6, is a compiler.
(I wouldn't be surprised if the code currently generated by Rakudo for this algorithm is far slower than the code generated by rustc but that isn't because it isn't a compiler.)
Loop unrolling as shown by the author is quite common in maths functions. It is still quite readable and ensure good performance. Openblas use a similar tricks in the old days when there is no intrinsics. Not sure for now.
Doesn't Perl 6 have macros? If you have macros, you can do it both elegantly and efficiently by expanding Horner's rule at compile time. You can still use reduce; just have it return the form unevaluated.
In fact, because the coefficients are constant at compile time, you could precondition the polynomial and do even better than Horner's rule, with only three multiplications.
Perl 6 does have macros but they are not quite mature and that would probably be overkill for a six coefficients polynomial. Reduce should be pretty fast anyway.
I wrote it in Perl 6 for instance:
1. https://en.wikipedia.org/wiki/Fold_%28higher-order_function%...