This has been the standard algorithm used by every libm for decades. Its not spe...

enriquto · on March 29, 2022

But isn't this code rarely called in practice? I guess on intel architectures the compiler just calls the fsin instruction of the cpu.

stephencanon · on March 29, 2022

No. FSIN has accuracy issues as sibling mentions, but is also much slower than a good software implementation (it varies with uArch, but 1 result every ~100 cycles is common; even mediocre scalar software implementations can produce a result every twenty cycles).

adgjlsfhk1 · on March 29, 2022

No. The fsin instruction is inaccurate enough to be useless. It gives 0 correct digits when the output is close to 0.

enriquto · on March 29, 2022

> 0 correct digits when the output is close to 0

this is an amusing way to describe the precision of sub-normal floating point numbers

adgjlsfhk1 · on March 29, 2022

It's not just sub-normal numbers. As https://randomascii.wordpress.com/2014/10/09/intel-underesti... shows, fsin only uses 66 bits of pi, which means you have roughly no precision whenever abs(sin(x))<10^-16 which is way bigger than the biggest subnormal (10^-307 or so)

ramshorns · on March 30, 2022

In that range, just returning x would be way better. Maybe even perfect actually - if x is less than 10^-16, then the error of x^3/6 is less than the machine precision for x.

adgjlsfhk1 · on March 30, 2022

the error isn't when x is small. it's when sin(x) is small. the problem happens for x near multiples of pi

lifthrasiir · on March 29, 2022

It is much more amusing if you describe it in ulps; for some inputs the error can reach > 2^90 ulps, more than the mantissa size itself.

jhgb · on March 29, 2022

FSIN only works on x87 registers which you will rarely use on AMD64 systems -- you really want to use at least scalar SSE2 today (since that is whence you receive your inputs as per typical AMD64 calling conventions anyway). Moving data from SSE registers to the FP stack just to calculate FSIN and then moving it back to SSE will probably kill your performance even if your FSIN implementation is good. If you're vectorizing your computation over 4 double floats or 8 single floats in an AVX register, it gets even worse for FSIN.

stephencanon · on March 29, 2022

Moving between x87 and xmm registers is actually fairly cheap (it's through memory, so it's not free, but it's also not _that_ bad). FSIN itself is catastrophically slow.

jhgb · on March 29, 2022

Fair enough, and I imagine there may even be some forwarding going on? There often is when a load follows a store, if I remember correctly. (Of course this will be microarchitecture-dependent.)

chrisseaton · on March 29, 2022

> I guess on intel architectures the compiler just calls the fsin instruction of the cpu.

Do people do that in practice? It's on the FPU, which is basically legacy emulated these days, and it's inaccurate.

enriquto · on March 29, 2022

> the FPU, which is basically legacy

you'll pry my long doubles from my cold, dead hands!

jhgb · on March 29, 2022

The question is if you wouldn't be better served with double-doubles today. You get ~100 bits of mantissa AND you can still vectorize your computations.

enriquto · on March 29, 2022

Sure. There should be a gcc flag to make "long double" become quadruple precision.

The thing is, my first programming language was x86 assembler and the fpu was the funniest part. Spent weeks as a teenager writing almost pure 8087 code. I have a lot of emotional investment in that tiny rolling stack of extended precision floats.

colejohnson66 · on March 29, 2022

How is "quad floating point" implemented on x86? Is it software emulation?

chrisseaton · on March 29, 2022

You're selectively quoting me - I said it's 'legacy emulated'. It's emulated using very long-form microcode, so basically emulated in software. I didn't say it was simply 'legacy'.

eyelidlessness · on March 30, 2022

I’m completely out of my depth reading through these comments so I don’t have much of value to contribute, but I do think I can gently say I think the selective quoting was harmless, but deliberate to fit the shape of a harmless joke’s setup. I don’t think there was any intent to misrepresent your more salient information.

lifthrasiir · on March 29, 2022

x87 trig functions can be very inaccurate due to the Intel's original sloppy implementation and subsequent compatibility requirements [1].

[1] http://nighthacks.com/jag/blog/134/index.html

jcelerier · on March 29, 2022

Wasn't there some blog article a few years ago which showed how glibc's implementation was faster than fsin ?