I also suspect the 0.9999.. was an actual number in the commenter's inputs to the fragment shader. Unless he checked the binary representation of the numbers in question (in the input) he could have missed what's actually going on. Formatting and printing the number in base 10 hides the only real thing, which is the actual bits that represent the number.
It's not rare for GPUs to be pushed so close to the boundaries of what the underlying hardware can do that you'll see little bits of noise like this.
For 'consumer' stuff this usually does not matter, but I think it is one of the reasons (besides revenues) that Nvidia has a 'compute' line and a 'display' line. Even if they're based on the exact same tech it would not surprise me one bit if the safety margins were eroded considerably on the consumer stuff and if it wasn't tested as good before shipping (on the chip level).
ATI would probably be a better example. Nvidia has historically been more precise and has implemented specs pretty faithfully. ATI has of course become AMD, and the reputation of modern AMD graphics cards among developers seems to be pretty great.
It would surprise me if sending the value 1.0 from a vertex program to fragment program ever resulted in anything except 1.0 on a desktop GPU. It's not unusual to truncate a float in a fragment program and the use the resulting integer as an index. I think Nvidia has a fairly rigorous battery of tests which they use to ensure compliance with specs. The tests check to make sure that values are being interpolated correctly, to check for off-by-one errors at the edges of triangles, and other such corner cases. The tests are automated, because once you implement a graphics spec correctly, you can generate an output image which should be the same every time it's generated. That way, if an error creeps into the driver, it should be caught pretty quickly, since it will cause a deviation in the rendered output examined by the test suite.
I'm not sure how often a problem occurs in an Nvidia consumer card but not their 'compute' line. Nvidia tries to use the same driver codebase across as many of their products as possible.
Here's an interesting thread where someone points out that certain OpenCL code was giving the correct result on x86 and Nvidia cards, but not AMD cards: http://devgurus.amd.com/thread/145582
In that thread, it's suggested that the reason Nvidia gave the correct result was because their CUDA compute specs had better error margins compared to OpenCL specs. So in that case, it would appear that Nvidia's compute line actually enhanced their accuracy in other contexts such as OpenCL. Rather than implement a different driver for OpenCL, CUDA, and consumer graphics, it seems that Nvidia uses the same driver to power all three, which seems to enhance accuracy rather than hinder it.
While writing up this comment and researching this, I stumbled across this interesting paper from 2011 that details exactly what programmers can expect when programming to an Nvidia GPU (in this case using CUDA): https://developer.nvidia.com/sites/default/files/akamai/cuda... ... It's a pretty fun read which helps to solidify the impression that Nvidia cares deeply about providing accurate floating point results.
I also came across this article which was too awesome not to mention: http://randomascii.wordpress.com/2013/07/16/floating-point-d... ... It's not really related to GPU floating point, but it's an excellent exploration of floating point determinism and the effect of various floating point settings.
For mobile graphics stacks like those in cellphones / iPads etc, I share your skepticism. I'm not sure how rigorously tested mobile GPUs are, or how faithfully those stacks adhere to specs. I was hoping to get more info from the original commenter about how to reproduce the bug in the A7 to get a better idea of what to watch out for on mobile.