Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I had to debug a nasty x87-related issue with a piece of scientific computing software. The output of each version of the software was deterministic, but different versions usually (not always) gave very, very slightly different output, even if the numerical code was unchanged. This bothered the original author, but he was a hardware engineer, and his attitude towards it was basically "software sucks, you can only depend on computers if you lay out the circuits yourself." So the mystery remained until I was brought on and was able to figure out that the default settings of gcc at the time did not use any of the SSE2 instructions supported on the workstations we were targeting, instead using x87 floating point instructions. The state of the x87 stack was affected by code interleaved with the numerical code (there was a bit more going on than just number-crunching) causing the 80-bit representations to be copied to and from 64-bit registers (and hence rounded) at slightly different points in the computation. This resulted in tiny changes in the output. I added a flag to enable SSE2 instructions, and thereafter the output only changed when the numerical code was changed.


The state of the x87 stack was affected by code interleaved with the numerical code (there was a bit more going on than just number-crunching) causing the 80-bit representations to be copied to and from 64-bit registers (and hence rounded) at slightly different points in the computation.

The x87 can load/store 80-bit floats from/to memory, so it can definitely save intermediate results in full precision if asked to; I'd call that more of a compiler flaw than anything else.


Had the experience on several occasions in C# on 32 bit. The whole idea of the 80bit operations is flawed in an environment where you can’t control the native code generated with register allocation and so on (so in most high level languages). We became so used to these bugs that we immediately recognized them when some calculation differed on just some machines.

As C# is jit-compiled, you could never be sure what code would actually run at the end user machine, and where the truncation from 80 to 64 bits would occur.

In the end the best cure is to ensure you never ever use x87, which happened automatically when dropping 32 bit support.

Determinism is too important to give up for 16 extra bits.


I feel like I read something once that said when writing numerical/scientific code, traditionally there were so many weird high performance computers that were used, e.g. Crays or whatever, you'd have to be robust to all sorts of different types of FP anyway.

Nowadays maybe that sort of diversity is less of an issue? Expecting determinism in the sense you mean it just seems weird to me.


Having absolute determinism is probably still difficult but using SSE on x64 on Windows, where all users have compatible compilers (I.e determinism without diversity) is at least “good enough” nowadays. I haven’t seen any issues with that scenario so far, even though it’s certainly possible problems can arise.


I think it’s in Goldberg91.


The round to 64 bits (it's a round, not a truncation) never occurs if you use a language type that is 80 bits.


The question, though, is which give you more accurate results? :-)


The one that let us build a library of bit-for-bit regression tests ;-) but for that I have to explain a bit more. There were computational "features" that could be turned on and off to tweak the computation for different problems and for different speed/accuracy trade-offs, and sometimes these features had very slight effects, so regressions could result in very small errors.

An experienced eye could easily see the difference between the tiny x87-related indeterminacy and other kinds of changes, but we were uncomfortable automating this comparison, and it took a while for someone without strong domain knowledge (such as myself or any other software engineer being hired) to become comfortable eyeballing it. With deterministic output, we could use automated tests to verify that, for example, the work we did to add a new computational feature did not change the output when that feature was not enabled, or that small changes intended as performance optimizations did not inject tiny numerical errors.

Our customers were also a lot more comfortable when they could use "diff" to validate that our latest X-times-faster release was really doing the same computation as the last one :-)

EDIT: We also got a noticeable speed-up by enabling the SSE2 instructions. The bulk of the numeric work was done in hardware, so it wasn't dramatic, but it was measurable.


Yes, that makes sense.

As for speed, Intel has neglected to keep up the x87.


This has happened to thousands of people throughout the ages. The damn --ffloat-store option! Ugh!!!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: