-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
numeric: false positive equality with f32 and 64 #5180
Comments
I would vote to just use standard equality checks. As you say, this is a fairly well-known problem, with far too many possible "solutions" - the programmer has to decide which is "right" in their situation. |
@UweKrueger I agree. |
@UweKrueger I agree too 👍 |
@UweKrueger thanks for pointing me to this issue. I actually must diasgree with most of the argumentation points. Let's walk through them one by one 😉. First of all I think the new epsilon-comparison is way closer to what is described in the golden article about comparing floats (a must read for everyone who uses floating point numbers for serious stuff - i.e. 0.01% of programmers). Thanks a lot @UweKrueger for implementing that!
That sounds like an argument why to actually make tolerance equality check a default (i.e.
I'd argue this is absolutely not relevant because we're just deciding defaults and by default 99.9% of good programmers do just know, that floats are getting less precise with increasing value. Very few might even have a rough idea how the function of precision approximately looks like in a given range. But none (yes, I mean it) would know how to implement e.g. ULP (Units in the Last Place) based checking for their floats - in other words they'd also be unable to choose from different options provided (e.g. between So, they'll just use the method they themself can explain & understand (i.e. the dumbest constant absolute value as difference). But only if you tell them, that
Floating point numbers are always very expensive. This is no surprise, so no argument on this side. If you want sacrifice correctness & precision in favor of speed, use intrinsic
Yes, perfect. It's the exact opposite use case than default. Algorithms and code designed & tuned for bitwise equality checks have nothing to do with the programmer and her programming. She doesn't programm it, but just copies it over with
This I don't understand. For me those are two very different use cases which should not be mixed under any circumstances. Could you name a use case where you work with the same number in both ways (once you prefer it to be equal to something in an equality check, but on the next line in the same "mental context" you prefer it to differ from something in an equality check)? If there are any such use cases, then V should imho do as much as it can to avoid or warn or just make it difficult to write them, because that sounds like a perfect programming anti-pattern.
One would argue, that it's not a big issue in practice, but I'll agree. What V could do is to disallow
I totally do agree with that. But I think this is irrelevant for the defaults which we discuss here. My proposal above to disallow V would become one of the few by default float-safe languages in the world and would significantly increase the precision and quality of float arithmetics in the world. I think those are the goals of V and not vice versa. Thus I think having bitwise equality check of floats as the default is too premature and very unsafe decision. @spytheman your thoughts? |
@dumblob Thanks for your detailed reply. I think it goes without saying that I disagree... ;-) I do completely agree that in most cases float should be compared with a tolerance. But as your linked article says: "There is no silver bullet. You have to choose wisely." The
Can you give me an example for any other language that does checks with tolerance by default? (lua and Javascript don't, I've checked.) I would like to investigate how it is implemented. BTW: I've figured out another point: Equality is supposed to be a transitive relation. From a=b and a=c follows b=c. Now imagine: a := 12.234567890123456
b := 12.234567890123464
c := 12.234567890123447
println('${(a==b)} ${(a==c)} ${(b==c)}')
println('${a.eq_epsilon(b)} ${a.eq_epsilon(c)} ${b.eq_epsilon(c)}') result as of
The first result is totally understandable: we do have three different numbers. This is what I meant with "well understood and conformant to IEEE754". The second result can also be understood when taking into account that we are checking with a tolerance. If the first line would yield the second result it would be a big surprise and not understandable in the mathematical sense (it's one of the problems that "become worse"). It's not "float-safe" and does not "significantly increase the precision". Actually the tolerance decreases the precision and leads to unexpected results - possibly at deferred points in time when they become harder to debug. This example also shows: The real problem is not the comparison: The three numbers are different and the default |
Just to make a note: |
😉
And that's where our experience wildly differs. I argue, that in all cases when the programmer doesn't explicitly tell the computer to do bitwise equality checks, the programmer does not care about the "different purposes". You argue that the programmer in all cases need to distinguish whether she does care about "different purposes" or not disregarding whether there is an additional construct offering tolerance equality check or not.
I think this is quite wrong, because what that example shows is IMHO two different cases:
See my comment above. First floating point in hardware has nothing to do with mathematics (as I said above - IEEE754 deliberately says it's not mathematically correct, it's just a convenient approximation of chosen mathematical constructs and operations). Second, tolerance equality checking does NOT decrease precision, it actually (at least in case of ULPs) precisely follows the IEEE754 standard. So it absolutely does NOT get worse.
Again, there is no cure needed - it's defined and expected behavior of HW-implemented approximating floating point arithmetic. Bitwise comparison won't help at all in these situations - it'll just significantly cut down the space of meaningful use cases which would be possible if tolerance equality check was the default. |
If talking about "how to implement tolerance check", then yes. But keep in mind, there is nothing like "compromise between speed and precision" in general with regards to HW-backed floting point. Precision is fixed (by HW FPU capabilities), it won't get worse, but it also won't get any better. And speed is not a question - intrinsics are fast as an assembly instruction. And speed of tolerance equality checking is a concern of second level - the first level is correctness (i.e. as I noted above - following the IEEE754 standard precisely e.g. by implementing ULP checking). |
I forgot to mention for those trying to see V as a mathematical language, that equality of real numbers is generally undecidable (see https://en.wikipedia.org/wiki/Real_number#In_computation , https://math.stackexchange.com/a/143964 , etc. ). So mathematically speaking by that logic V shouldn't even allow any comparison between any floating point numbers 😉. But that's an obvious nonsense, so let's stick to pragmatism & simplicity and treat operator comparisons as tolerance equality check. |
Before V will freeze its semantics in one of the upcoming releases, please read the following and act accordingly. Thank you. I did a lot more thorough research and found out, that floating point is far worse than you would ever think. This sharply contradicts the general belief of most commenters here that "it's well understood". No, it's NOT - read further. First there are several standards in use. The most prominent one - IEEE 754 - being significantly revised every few years. The second most widespread is the one found in POWER architecture (btw. POWER10 implements at least 3 inherently different floating point standards!). And there are others - for us especially all those supported by the language C. This all means, V must not assume anything about the underlying floating point standard (so please no IEEE 754 arguments any more 😉). Second, there is nothing like bitwise equality in IEEE 754. Simply because e.g. IEEE 754 mandates, that Third, floating point implementations (be it in hardware FPU or software microcode or programming language compiler or operating system or an arbitrary mixture of these) are basically never correct nor similar in behavior across computing machines (incl. virtual ones). Some notable examples (by far not comprehensive):
Fourth, these are some languages having built-in floating point equality well-defined (unlike many which ignore all the above issues).
[1]: I hope V won't join this non-productive club. Btw. Lua indeed doesn't do anything else than C comparison though by default Lua compares itself to 32bit integers which can be losslessly represented by the Lua's number (which is 64bit floating point). Fifth, I wrote some testing programs (in C, in Julia, in Mathematica) to see different aspects of floating point implementations and I can confirm what is written above. All in all floating point implementations (not only those in hardware!) are notoriously full of mistakes and of incomplete features - and that'll be true for future chips and software platforms as well (this follows also from the fact, that e.g. IEEE 754 is being developed, revised and amended more or less every now and then). So, the bottom line is, that any non-approximating comparison (such as plain Ask yourself whether you knew all of the above (I didn't 😢). IMHO the easiest (and uncontroversial) would be to disallow
(a good "test case" is a comparison of size of an atom to the Planck length or "linked comparison" like For (1) (2) (3) (4) operator An open question regarding this 5-case API is, whether it shall be extended by means for specification of behavior in the "linked comparison" case (i.e. Having just an API has also the advantage, that V can be extended by allowing Btw. comparison operators have a strong influence on e.g. floats as keys in maps - currently it's undefined how to withdraw a value from such a map (#8556 ). Related notable implementations for study purposes include SYCL-BLAS (combination of relative and absolute), Googletest (plain ULPs difference - they use Btw. note, that mathematically real numbers do not have any infinity, so again as in the above post, we can't take any inspiration from them (there are extended real numbers having +-infinity, but that doesn't apply here either). @medvednikov, @UweKrueger, @spytheman, @ulises-jeremias, @helto4real |
I think by "well understood" most people understand very well that it's useless to try comparing regular computer floating point numbers for exact equality. Regardless of what novices expect. It is also "well understood" that the more options you give, the more inventive ways people will get things wrong, and then complain that it didn't work exactly as they expected. I applaud your diligence in researching the options, but I still wonder if this level of effort is required for core V. Perhaps instead we could have a good, high-precision external module for those who understand what they're doing, and need what you've proposed. The fact that this has only been implemented in a few highly specialized languages, and only in external modules for others, hints at a lack of great need. |
It's not (just) about equality. It's about
Yes. That's why I "fight" for disabling all ordinary operators for floats. Whether V will have any other API is a separate issue. But I'd like to reach consensus on disabling the ordinary undefined operators. Then we can discuss API/modules/whatever.
This is a strong exaggeration. Note also, that most of these languages are older than most of other languages which do not have built-in floating point equality well-defined (in other words "newer languages seem more crappy in that they promote undefined behavior"). |
Only a dumb proposal: |
Agreed, and I would love to see it solved. I'm certainly not against the idea of having 100% precise floating point operations, but what cost would that have on compile AND run times? One of the biggest selling points of V is that it is so fast, and if adding this extra precision slowed it by a significant amount, that point goes away. |
My point is to get rid of undefined behavior in defaults. And letting float operators be as they are now wouldn't meet this criterion 😢.
This is a very cool and simple idea, thanks! This basically makes the language itself instrumentable (which is awesome on its own!). Don't know though whether V supports that. I'd be all for that (of course under the condition, that the default behavior without this module loaded would disallow all these float operators completely and first loading this module would allow their usage).
No worries, the 5. case ( And basically all other cases can be implemented very efficiently (even the 1. case as demonstrated by the popularity of Excel). |
As a follow up on #5180 (comment) , it seems due to equality operators being now overloadable we got much closer to the proposal of disabling all equality operators for floats by default and allowing to |
Related: #8840 (comment) |
Motto:
(digging deeper in the rabbit hole of the wrong path did not bring any fruit anyway) Now I'm pretty certain the best approach would be to abandon supporting This scheme doesn't have the flaws as noted above, is intuitive, is fully multiplatform, is performant (even on 6 years old HW the Btw. |
Apart from the default behavior, we could also look at the broader scope of "pluggability of real-like numbers" in V (maybe like https://github.com/stillwater-sc/universal for C++ where it's just one This might give us an important insight what the ecosystem then could look like. And yeah, it looks much better than now. |
V version: V 0.1.27 076089d.b0f66a4
OS: Manjaro Linux 20.02 x86_64
What did you do?
What did you expect to see?
-1.60218e-19 is smaller than 9.10938e-31
What did you see instead?
-1.60218e-19 and 9.10938e-31 are equal
-1.60218e-19 is smaller than 9.10938e-31
Discussion
The issue is of cause caused by the comparison function
f64_eq()
that checksf64_abs(a - b) <= DBL_EPSILON
DBL_EPSILON
has the constant absolute value2.22e-16
- the distance between two consecutivef64
numbers in the range1.0..2.0
. So this function has no tolerance effect for numbers> 2.0
and has too much tolerance for small numbers. A better approach might be checking with a relative tolerance like inf64_abs(a - b) <= f64_abs(a)*(2.5*DBL_EPSILON)
However, actually I think implementing such checks in the
v
core language is not really a good idea and I'd like to discuss this issue. Here are the points I see:2.5
above is just a guess, there might be cases where a bigger relative tolerance is needed and there are other cases where an absolute tolerance is appropriate. So there is no canonical way for thev
core languagef64
multiplications are somewhat expensivef64
can be reduced with tolerant checks others become worse (numbers that should differ seen to be equal).<
,==
and>
should evaluate totrue
For these reasons I'd like to propose using standard equality checks in the v core language. Any thoughts?
The text was updated successfully, but these errors were encountered: