No. FSIN has accuracy issues as sibling mentions, but is also much slower than a good software implementation (it varies with uArch, but 1 result every ~100 cycles is common; even mediocre scalar software implementations can produce a result every twenty cycles).
It's not just sub-normal numbers. As https://randomascii.wordpress.com/2014/10/09/intel-underesti... shows, fsin only uses 66 bits of pi, which means you have roughly no precision whenever abs(sin(x))<10^-16 which is way bigger than the biggest subnormal (10^-307 or so)
In that range, just returning x would be way better. Maybe even perfect actually - if x is less than 10^-16, then the error of x^3/6 is less than the machine precision for x.
FSIN only works on x87 registers which you will rarely use on AMD64 systems -- you really want to use at least scalar SSE2 today (since that is whence you receive your inputs as per typical AMD64 calling conventions anyway). Moving data from SSE registers to the FP stack just to calculate FSIN and then moving it back to SSE will probably kill your performance even if your FSIN implementation is good. If you're vectorizing your computation over 4 double floats or 8 single floats in an AVX register, it gets even worse for FSIN.
Moving between x87 and xmm registers is actually fairly cheap (it's through memory, so it's not free, but it's also not _that_ bad). FSIN itself is catastrophically slow.
Fair enough, and I imagine there may even be some forwarding going on? There often is when a load follows a store, if I remember correctly. (Of course this will be microarchitecture-dependent.)
The question is if you wouldn't be better served with double-doubles today. You get ~100 bits of mantissa AND you can still vectorize your computations.
Sure. There should be a gcc flag to make "long double" become quadruple precision.
The thing is, my first programming language was x86 assembler and the fpu was the funniest part. Spent weeks as a teenager writing almost pure 8087 code. I have a lot of emotional investment in that tiny rolling stack of extended precision floats.
You're selectively quoting me - I said it's 'legacy emulated'. It's emulated using very long-form microcode, so basically emulated in software. I didn't say it was simply 'legacy'.
I’m completely out of my depth reading through these comments so I don’t have much of value to contribute, but I do think I can gently say I think the selective quoting was harmless, but deliberate to fit the shape of a harmless joke’s setup. I don’t think there was any intent to misrepresent your more salient information.