The operator overlooks my pet peeve when it comes to comparing floating-point numbers: in C++ the "standard associative containers" (std::set and std::map) are based on ordering using an less-than relationship which must be a "strict weak ordering". Many times, the methods suggested for comparing floating point types do not satisfy the requirements of "strict weak ordering", and in this case the C++ standard says you've entered the realm of undefined behavior. In the code at $DAY_JOB, "undefined behavior" turned out in this case to include such pleasant side-effects as double frees(!).
Specifically: when you have a less-than relationship "<", then !(a<b) && !(b<a) implies that a and b are equal (a==b). And if a==b and b==c then it must be the case that a==c, or the requirements of the ordering predicate are not met. Unfortunately, under most of these FP comparison schemes, for numbers a and b that are "close but not too close", it's the case that a<b, but for x=(a+b)/2, a==x and x==b!
This is actually caused by the use of the x87 80 bit floating point registers. (Infamous GCC bug #323)
When the float is first inserted into the set/map it has 80 bit precision, but that gets truncated to float or double precision during the store. This breaks the ordering as you are saying, but it's not an inherent flaw with floats as such.
The problem goes away if you compile with -mfpmath=sse because then the math will be performed in the same precision as the storage format.
Bug #323 is responsible for a huge amount of mistrust of floats that they don't deserve. Other compilers don't have this problem because they truncate the floats before any comparison.
Yes, though I'm specifically talking about when you decide to define a 'bool less_than(double, double)' that uses some kind of fuzzy comparison approach internally. This can affect any platform, not just one with the "bug #323" behavior in it.
You sure might imagine you need to. For instance, suppose you want to average pieces of data that are timestamped "almost the same", but could arrive to be processed with varying delays, including out of order arrival (so you can't just say "is this datum at 'about the same time as' the datum received just prior"). There are better approaches, but the one I inherited involved using a std::map which used a fuzzy less than as the ordering predicate; and my main task was to diagnose why, once in a blue moon, a segmentation fault could occur when doing some operation on the map (insertion, I think).
As bug #323 points out, truncating floats is an incomplete solution and brings its own problems. The GNU people were not simply being lazy or ignorant, their approach was valid.
The "bug" is x87 design, period. And x87 is the past. It's old, and bad. SSE and IEEE 754 is the present.
> Specifically: when you have a less-than relationship "<", then
!(a<b) && !(b<a) implies that a and b are equal (a==b). And if a==b
and b==c then it must be the case that a==c, or the requirements of
the ordering predicate are not met.
¬(a<b) ∧ ¬(b<a) → a=b is not, in fact, a requirement on <. Rather, the
point is that behind the scenes, any two elements satisfying ¬(a<b) ∧
¬(b<a) are treated as equivalent by these containers.
To see the difference, consider the (somewhat counter-intuitive)
behavior of NaNs: for any two NaNs m, n, we have ¬(m<n) and ¬(n<m),
yet also m≠n. If that implication were an actual requirement, then
the usual ordering < on floats would not be a suitable ordering
predicate. What happens, though, is that the containers will
implicitly treat NaNs as equivalent, i.e., the notion of equivalence
the container uses for the elements depends only on < and might not
coincide with the usual ==.
Ah! fond memories, related one of the most vexing and entertaining bug that I discovered in my code.
I was using C++'s std::sort and soon enough things would go wrong. It would crash at unpredictable times, I suspected I was corrupting memory somewhere. Parts of my data structures would get overwritten by parts from some other data structure. I checked and checked and checked my code, nothing seemed wrong.
Its only after opening the covers, when I started peering into the sort that I realized my mistake. I was passing the comparison operator that was a "less than equal" relation.
>When comparing to some known value—especially zero or values near it—use a fixed ϵ that makes sense for your calculations.
If you're ever doing mathematical calculations of any sort, it is a good practice to have a handle on the scale your numbers will lie within. If not just to be a better professional, it helps you choose an ϵ that matches.
I have seen bugs that, essentially, were caused by a floating point less-than comparison. You can still get bitten if you're not careful. (A calculation was being passed to acos(x), which is undefined for x > 1. In our case, mathematically, the inputs evaluated to exactly 1, but in the land of floating points, it was slightly off.)
(The above bug is a restatement of bubblethink's sibling comment; it's an inequality where the two values are extremely (exactly!) close.)
The Go bug on the lack of a round function had several broken implementations, none of which had a direct equality[1].
The problem is that programmers expect the float to be a decimal (smaller than the float) or integer.
For example if you want to calculate that two vectors are parallel then comparing for lesser or greater won't give you the expected result if you need them to be zero in difference. You might think that the vectors (0, 10) and (10, 10) are parallel but using !(x > 0 || x < 0) won't help you. You still end up using a threshold of the smallest float, something like (abs(x - 0) < 0.0000000000000000000001)
Comparing numbers is easy. The operators are there in the manual.
The hard part is understanding when it is appropriate to compare floating point numbers and how to produce them.
I regularly use floating point numbers as keys in dictionaries and of course all the code quality tools whine about comparisons being inexact. But in my case there is no fuzziness because the keys are all produced by the same method and hence do not suffer from any different rounding errors.
Pretty much every new member of the team sees floats being compared and has heart attack yet the code in question is in the oldest and most reliable component of the whole million line program.
Just don't expect two numbers produced by different expressions that would be mathematically equivalent but have operators in a different order to produce identical results.
I got confused reading this, because first of all it has a picture of 64 bit floating point, then it starts comparing float to int32_t. Obviously 64 bit floating point is "double" not "float", but then it has a picture of 64 bit and program code using int32_t.
What I would say is that when you're considering comparison of floating point numbers, it's important to understand what the operation means in terms of the data which you're representing using the floating point numbers, in other words what does it mean in terms of the data for two values to be equal or not equal. Usually there is a precision inherent in the data itself which will guide you to how to formulate equality, if necessary.
Here's my less-than-scientific floating point near-equality test I use.
bool zero(float x) ( return x*x < FLT_EPSILON; }
bool equal_float(float a, float b) {
return (zero(a) && zero(b)) || // both are zero
zero((a-b)*(a-b) / (a*a + b*b)); // or relative error squared is zero
}
This checks equality to about four decimal digits for 32 bit single precision and seven digits for 64 bit floats. Inf/NaN special values are not considered.
`FLT_EPSILON` represent the minimum difference between two adjacent floats around 1.0; it should be scaled according to the input argument. E.g. your `equal_float` returns `true` for 2e-6 and 4e-6 which are clearly not the same number.
A better comparison would check for zero somehow like this:
I typically use this with doubles and DBL_EPSILON, which is much much smaller than FLT_EPSILON.
With FLT_EPSILON this roughly equals to "zero" being "less than 0.001". If the zero check is omitted, there's going to be a division by near-zero which will make the results nonsense (and you have to draw the line somewhere). With DBL_EPSILON "zero is less than 0.000000001".
If this is too loose, then `zero(x) = abs(x) < FLT_EPSILON` makes it much stricter (about 1e-7).
This is good enough for my purposes, I don't deal with very small numbers in float and doubles give more than enough precision.
NOTE: I usually use this kind of comparison in testing by comparing known "gold" figures against the results of the code being tested. I don't test accuracy, I test for "in the ballpark" because the stuff I deal with has built-in inaccuracy in the algorithm and numerics.
The version you posted will always return false if I read it correctly.
There was a very good article explaining this using MATLAB, but I can't find it right now. This one is pretty close and explains the concepts of overflow, underflow, etc. The diagrams about "eps" are pretty good, even if your language of choice is Python, C/C++, etc.
Formatting note for the author: on Safari Mac, something is causing ff and fl ligatures to be applied even to the monospaced code, which makes it look kind of weird.
I still see it in a couple of places where code-formatted text is inline with regular text, for example "relative_difference." The code blocks themselves look good.
Hm, this is interesting--in over a decade the only time I can think I've ever needed to compare floats are 1) deduping duplicate data, naive comparison is what I want 2) disambiguating messy user data, which I would take floats over strings any day
It looks like numeric is a decimal type. 1/7 can't be represented exactly in either binary or decimal, so it's going to come down to rounding. It just so happens that 1/7 in binary, rounded to float precision, then multiplied by 7, is equal to 1. Do the same in decimal with whatever precision numeric gives you, and the result is not 1. I don't think there's any deep reason for it, it's just how it happens to work out. You can probably find a value where the opposite is true.
Floating-point math is carefully-defined. There are multiple independently-developed but interoperable implementations and an IEEE standard that talks in detail about how floating-point math is supposed to work. It's not "a bit fuzzy."
For something more concrete, consider Section 8, "Variations Allowed by the IEEE Floating-Point Standard", of the TestFloat tool for testing floating point implementations for IEEE compliance:
And of course, many arithmetic operations (e.g., trig functions) aren't even covered by the standard, which occasionally provokes consternation like this...
The problem with the "floating point is fuzzy" comment is that people start treating it as some sort of black box, or as if the results are random somehow. Sure there are a few weird things with floating point status flags, but mostly "fuzziness" is perfectly understandable when you grasp what it is doing (including the usual 0.1 + 0.2 != 0.3).
Also the standard does specify trig functions (§9.2 Recommended correctly rounded functions), but that's one of the optional parts, and as far as I know no one has actually implemented them fully (CRlibm came close, but I don't think their pow function has been fully proven to be correctly rounded, and in any case it isn't widely used).
This is actually a big problem with most standards: a lot of them contain finicky details about which only a very small subset of people care. As far as I know, there still isn't a C compiler that implements all the PRAGMAs specified in the C-1999/C-2011 specs.
I wasn't aware of the ambiguity surrounding the underflow flag noted in your second link. However, the other complaints I'm reading from your links (things not specified by the standard may vary in behaviour; the standard is written in English and could have been written in different or better English) do not impact the semantics of floating-point arithmetic in a material way.
> so many languages will round to the nearest integer if the float is within a certain margin
What language does this? It is more likely that it is printing out a truncated form but the number still contains that small difference, such as it will print "1.1" even though 1.1 isn't exactly representable in binary. This comes from the often used Grisu set of algorithms for printing which (in Grisu3) prints the smallest string that can represent the floating point bit pattern.
That behavior was what I was referring to. I only have limited understanding on the topic; it seems my explanation was not quite right. Thanks for correcting me.
My basic rule of thumb is to only use floats at the presentation layer, for storage, or for measurements. If you're doing any serious calculations you need to normalise to a more reliable format first.
Specifically: when you have a less-than relationship "<", then !(a<b) && !(b<a) implies that a and b are equal (a==b). And if a==b and b==c then it must be the case that a==c, or the requirements of the ordering predicate are not met. Unfortunately, under most of these FP comparison schemes, for numbers a and b that are "close but not too close", it's the case that a<b, but for x=(a+b)/2, a==x and x==b!