I had to debug a nasty x87-related issue with a piece of scientific computing so...

userbinator · on July 15, 2020

The state of the x87 stack was affected by code interleaved with the numerical code (there was a bit more going on than just number-crunching) causing the 80-bit representations to be copied to and from 64-bit registers (and hence rounded) at slightly different points in the computation.

The x87 can load/store 80-bit floats from/to memory, so it can definitely save intermediate results in full precision if asked to; I'd call that more of a compiler flaw than anything else.

alkonaut · on July 14, 2020

Had the experience on several occasions in C# on 32 bit. The whole idea of the 80bit operations is flawed in an environment where you can’t control the native code generated with register allocation and so on (so in most high level languages). We became so used to these bugs that we immediately recognized them when some calculation differed on just some machines.

As C# is jit-compiled, you could never be sure what code would actually run at the end user machine, and where the truncation from 80 to 64 bits would occur.

In the end the best cure is to ensure you never ever use x87, which happened automatically when dropping 32 bit support.

Determinism is too important to give up for 16 extra bits.

perl4ever · on July 14, 2020

I feel like I read something once that said when writing numerical/scientific code, traditionally there were so many weird high performance computers that were used, e.g. Crays or whatever, you'd have to be robust to all sorts of different types of FP anyway.

Nowadays maybe that sort of diversity is less of an issue? Expecting determinism in the sense you mean it just seems weird to me.

alkonaut · on July 15, 2020

Having absolute determinism is probably still difficult but using SSE on x64 on Windows, where all users have compatible compilers (I.e determinism without diversity) is at least “good enough” nowadays. I haven’t seen any issues with that scenario so far, even though it’s certainly possible problems can arise.

nraynaud · on July 15, 2020

I think it’s in Goldberg91.

WalterBright · on July 14, 2020

The round to 64 bits (it's a round, not a truncation) never occurs if you use a language type that is 80 bits.

WalterBright · on July 14, 2020

The question, though, is which give you more accurate results? :-)

dkarl · on July 14, 2020

The one that let us build a library of bit-for-bit regression tests ;-) but for that I have to explain a bit more. There were computational "features" that could be turned on and off to tweak the computation for different problems and for different speed/accuracy trade-offs, and sometimes these features had very slight effects, so regressions could result in very small errors.

An experienced eye could easily see the difference between the tiny x87-related indeterminacy and other kinds of changes, but we were uncomfortable automating this comparison, and it took a while for someone without strong domain knowledge (such as myself or any other software engineer being hired) to become comfortable eyeballing it. With deterministic output, we could use automated tests to verify that, for example, the work we did to add a new computational feature did not change the output when that feature was not enabled, or that small changes intended as performance optimizations did not inject tiny numerical errors.

Our customers were also a lot more comfortable when they could use "diff" to validate that our latest X-times-faster release was really doing the same computation as the last one :-)

EDIT: We also got a noticeable speed-up by enabling the SSE2 instructions. The bulk of the numeric work was done in hardware, so it wasn't dramatic, but it was measurable.

WalterBright · on July 15, 2020

Yes, that makes sense.

As for speed, Intel has neglected to keep up the x87.

titzer · on July 15, 2020

This has happened to thousands of people throughout the ages. The damn --ffloat-store option! Ugh!!!