[libc-commits] [PATCH] D104615: [libc] Calculate ulp error after rounding MPFR result to the result type.

Tue Ly via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed Jun 23 11:18:16 PDT 2021


lntue accepted this revision.
lntue added inline comments.
This revision is now accepted and ready to land.


================
Comment at: libc/utils/MPFRWrapper/MPFRUtils.cpp:301
     // we multiply by its inverse 2^{-e}.
     mpfr_mul_2si(mpfrInput.value, mpfrInput.value, -epsExponent, MPFR_RNDN);
 
----------------
sivachandra wrote:
> lntue wrote:
> > sivachandra wrote:
> > > lntue wrote:
> > > > sivachandra wrote:
> > > > > sivachandra wrote:
> > > > > > lntue wrote:
> > > > > > > lntue wrote:
> > > > > > > > sivachandra wrote:
> > > > > > > > > sivachandra wrote:
> > > > > > > > > > lntue wrote:
> > > > > > > > > > > sivachandra wrote:
> > > > > > > > > > > > lntue wrote:
> > > > > > > > > > > > > If we change to | input  - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
> > > > > > > > > > > > > min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable:
> > > > > > > > > > > > > - if we use the eps(input), the calculated ulp will be 2,
> > > > > > > > > > > > > - if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1.
> > > > > > > > > > > > > A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) )
> > > > > > > > > > > > > Moreover, another advantage of using min (eps, eps) is that the ulp function will then be symmetric: ulp(a, b) = ulp(b, a).
> > > > > > > > > > > > It seems like the problem you describe will occur irrespective of this change, no?
> > > > > > > > > > > > Also, why should we want `ulp(b, a) = ulp(a, b)`? Thinking of it now, it seems like we should define `ulp` error to be:
> > > > > > > > > > > > 
> > > > > > > > > > > > ```
> > > > > > > > > > > > |float(mpfrValue) - input|/eps(float(mpfrValue))
> > > > > > > > > > > > ```
> > > > > > > > > > > > 
> > > > > > > > > > > > My reasoning is, the error should be relative to what we think is the correct/more accurate answer. Since we treat `float(mpfrValue)` to be the more accurate one, we should be calculating the error wrt its `eps`. Anything wrong with this reasoning? The point you raise about ulp error of 2 vs 1 is very valid. We already it special case it one way. Should we special case the other way around as well? And may be that is why you are saying we should generalize these special cases with a symmetric algorithm to calculate the ulp error?
> > > > > > > > > > > Yes, it is the same problem with / eps(float(mpfrValue)), with input is 2^n and float(mpfrValue) is 2^n - (eps(x)/2).  So the only reasonable way that will return 1 ulp for both cases is to / min (eps(input), eps(float(mpfrValue)).  And actually these edge cases are the only time / min(eps, eps) gives different answers than / eps(input) or eps(float(mpfrValue).
> > > > > > > > > > > If we change to | input  - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
> > > > > > > > > > > min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable:
> > > > > > > > > > > - if we use the eps(input), the calculated ulp will be 2,
> > > > > > > > > > > - if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1.
> > > > > > > > > > > A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) )
> > > > > > > > > > 
> > > > > > > > > > I want to discuss this example more. Assuming we are dealing with single precision floating point numbers and that `mpfrResult = float(actualMpfrResult)`:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > input = float(1 - 2 ^ (-24))
> > > > > > > > > > mpfrResult = float(1);
> > > > > > > > > > eps(mpfrResult) = 2 ^ (-23)
> > > > > > > > > > eps(input) = 2 ^ (-24)
> > > > > > > > > > 
> > > > > > > > > > |mpfrResult - input| = 2 ^ (-24)
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > So, if we calculate ulp error wrt the input `eps`, then:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > ulp = (2 ^ (-24)) / (2 ^ (-24)) = 1
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > If we calculate ulp error wrt `eps` of `mpfrResult`, then:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > ulp = (2 ^ (-24)) / (2 ^ (-23)) = 1/2
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > Lets consider the other way around example:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > input = float(1 + 2 ^ (-23))
> > > > > > > > > > mpfrResult = float(1);
> > > > > > > > > > eps(mpfrResult) = 2 ^ (-23)
> > > > > > > > > > eps(input) = 2 ^ (-23)
> > > > > > > > > > 
> > > > > > > > > > |mpfrResult - input| = 2 ^ (-23)
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > And so the ulp error will be 1 in whichever way we calculate.
> > > > > > > > > > 
> > > > > > > > > > Now, consider another example:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > input = float(1 - 2 ^ (-24))
> > > > > > > > > > mpfrResult = float(1 + 2 ^ (-23));
> > > > > > > > > > eps(mpfrResult) = 2 ^ (-23)
> > > > > > > > > > eps(input) = 2 ^ (-24)
> > > > > > > > > > 
> > > > > > > > > > |mpfrResult - input| = 2 ^ (-23) + 2 ^ (-24)
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > So, if we calculate ulp error wrt the input `eps`, then:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-24)) = 3
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > If we calculate ulp error wrt `eps` of `mpfrResult`, then:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-23)) = 1.5
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > I think our goal should be to treat bit distances on either sides of `2^N` uniformly where `N = max(exp(input), exp(mpfrResult))`. Then:
> > > > > > > > > > 
> > > > > > > > > > ```
> > > > > > > > > > N = max(exp(input), exp(mpfrResult))
> > > > > > > > > > eps_input = 2^(exp(input) - 23)
> > > > > > > > > > eps_mpfr = 2^(exp(mpfrResult - 23)
> > > > > > > > > > ulp = |2^N - input|/eps_input+ |2^N - mpfrResult|/eps_mpfr
> > > > > > > > > > ```
> > > > > > > > > > 
> > > > > > > > > > I think this formulation not only has the symmetry property, but also corresponds to the bit distances for close enough results (which do not differ in the exponent by more than 1). For results farther apart, I don't think it matters. WDYT?
> > > > > > > > > When `exp(input) == exp(mpfrResult)`, the formula can be simply:
> > > > > > > > > 
> > > > > > > > > ```
> > > > > > > > > ulp = |input - mpfrResult|/eps(input)
> > > > > > > > > ```
> > > > > > > > > 
> > > > > > > > > This should also take care of numbers on either side of 0.
> > > > > > > > In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number).
> > > > > > > One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when:
> > > > > > > input = float(1) and float(mpfrResult) = float(1 - 2^(-23))
> > > > > > > Their representation differ by last 2 bits, but ulp calculation will return 1 if we use max eps.
> > > > > > > One main problem with using max( eps(input), float(eps(mpfrResult)) ) is that it will give us a false-positive when:
> > > > > > > input = float(1) and float(mpfrResult) = float(1 - 2^(-23))
> > > > > > > Their representation differ by last 2 bits, but ulp calculation will return 1 if we use max eps.
> > > > > > 
> > > > > > So, lets do ulp error calculation for this example using the scheme I proposed:
> > > > > > 
> > > > > > ```
> > > > > > eps_input = 2 ^ (-23)
> > > > > > eps_mpfr = 2 ^ (-24)
> > > > > > N = max(exp(input), exp(mpfrResult)) = 0
> > > > > > ulp = |2^N - input|/eps_input + |2^N - mpfrResult|/eps_mpfr = 0 + (2 ^ (-23))/(2 ^ (-24) = 2
> > > > > > ```
> > > > > > 
> > > > > > So, the ulp error calculated is as expected. The solution I am proposing is NOT to use `max(eps(input), eps(mpfrResult))`. May be you are misreading the step which calculates `N`?
> > > > > > In either case, we won't have to worry about 0, because eps(0) == eps( smallest non-zero denormal number).
> > > > > 
> > > > > What I meant to say is that, for all cases in which `input` and `mpfrResult` differ in sign, we can use the simple formula:
> > > > > 
> > > > > ```
> > > > > ulp = |input - mpfrResult|/eps(input)
> > > > > ```
> > > > > 
> > > > > This will be correct for close enough numbers (numbers which have the same exponent).
> > > > Sorry for the confusion.  So let summary as following: consider 4 options:
> > > > 
> > > > ```
> > > > ulp_1 = | input - float(mpfrResult) | / eps_input
> > > > ulp_2 = | input - float(mpfrResult) | / eps_mpfrResult
> > > > ulp_3 = | input - float(mpfrResult) | / min( eps_input, eps_mpfrResult )
> > > > ulp_4 = | input - float(mpfrResult) | / max( eps_input, eps_mpfrResult )
> > > > ```
> > > > 
> > > > When eps_input == eps(mpfrResult), all 4 ulp functions will return the same answer, so it doesn't matter which one to use in this case.
> > > > 
> > > > On the other hand, on the edge cases:
> > > > 
> > > > | input | float(mpfrResult) | eps_input | eps_mpfrResult | ulp_1 | ulp_2 | ulp_3 | ulp_4 |
> > > > | 1 | 1 - 2^(-23) | 2^(-23) | 2^(-24) | 1 | 2 | 2 | 1 |
> > > > | 1 - 2^(-23) | 1 | 2^(-24) | 2^(-23) | 2 | 1 | 2 | 1 |
> > > > 
> > > > So if we using eps_input (ulp_1), we will risk accepting (1 approximating 1 - 2^(-23) with 1 bit of accuracy) and with eps_mpfrResult, we will risk accepting (1 - 2^(-23) approximating 1 with 1 bit of accuracy).
> > > > 
> > > > So I think using ulp_3 overall is the correct one to use if the goal is to have at most 1 bit difference compared to mpfr results.
> > > Let me define a `ulp_5` as follows:
> > > 
> > > ```
> > > ulp_5 = |2^N - input| / eps(input) + |2^N - mpfrResult| / eps(mpfrResult)
> > > ```
> > > 
> > > where:
> > > 
> > > ```
> > > N = max(exponent(input), exponent(mpfrResult)) 
> > > ```
> > > 
> > > And now lets add a couple of more rows and a column to the table you have:
> > > 
> > > 
> > > | input          | float(mpfrResult) | eps_input | eps_mpfrResult | ulp_1 | ulp_2 | ulp_3 | ulp_4 | ulp_5 |
> > > | 1                 | 1 - 2^(-23)           | 2^(-23)     | 2^(-24)               | 1        | 2        | 2        | 1        | 2        |
> > > | 1 - 2^(-23) | 1                           | 2^(-24)     | 2^(-23)               | 2        | 1        | 2        | 1        | 2        |
> > > | 1 - 2^(-24) | 1 + 2 ^ (-23)        | 2^(-24)     | 2^(-23)               | 3        | 1.5     | 3        | 1.5     | 2        |
> > > | 1 + 2^(-23) | 1 - 2 ^ (-24)        | 2^(-23)     | 2^(-24)               | 1.5     | 3        | 3        | 1.5     | 2        |
> > > 
> > > 
> > > So, the point I am trying to make is that `ulp_5` captures the bit distance better than the other definitions of ulp error. For example, `ulp_3` is overestimating the bit distance.
> > ulp_5 will work well when 2^N is between input and mpfrResult, but it will return wrong answer when 2^N < min( input, mpfrResult ).
> > For example, if input == 1 + 2^(-23) == mpfrResult, ulp_5 will return 2.  ulp_3 does have overestimating bit distance, but as I mentioned, it does correctly capture whether the input and float(mpfrResult) are within 1-bit of each other.
> Yes. That is why, in a separate comment above, I said that if `exponent(input) == exponent(mpfrResult)`, then ulp error should be calculated as:
> 
> ```
> ulp = |input - mpfrResult|/eps(input)
> ```
> 
> I have also said that, to keep it simple, we can apply the same formula when `input` and `mpfrResult` differ in sign.
> 
> 
> That said, I think we both are now talking about the same thing. What I want to understand next is, how is this discussion related to change being attempted in this patch? As in, can the change to the ULP error formula be done in a separate patch? IIUC, what you are trying to point out is, with `mpfrResult` rounded to the target floating point format as done in this change, we can now make ULP error match the bit distance better?
Yes, updating the ULP error to match the bit distance better with explanation comments in a followup patch SGTM.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104615/new/

https://reviews.llvm.org/D104615



More information about the libc-commits mailing list