[libc-commits] [PATCH] D104615: [libc] Calculate ulp error after rounding MPFR result to the result type.

Tue Jun 22 20:28:02 PDT 2021

sivachandra added inline comments.

================
Comment at: libc/utils/MPFRWrapper/MPFRUtils.cpp:301
     // we multiply by its inverse 2^{-e}.
     mpfr_mul_2si(mpfrInput.value, mpfrInput.value, -epsExponent, MPFR_RNDN);

----------------
lntue wrote:
> sivachandra wrote:
> > lntue wrote:
> > > If we change to | input  - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
> > > min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable:
> > > - if we use the eps(input), the calculated ulp will be 2,
> > > - if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1.
> > > A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) )
> > > Moreover, another advantage of using min (eps, eps) is that the ulp function will then be symmetric: ulp(a, b) = ulp(b, a).
> > It seems like the problem you describe will occur irrespective of this change, no?
> > Also, why should we want `ulp(b, a) = ulp(a, b)`? Thinking of it now, it seems like we should define `ulp` error to be:
> > 
> > ```
> > |float(mpfrValue) - input|/eps(float(mpfrValue))
> > ```
> > 
> > My reasoning is, the error should be relative to what we think is the correct/more accurate answer. Since we treat `float(mpfrValue)` to be the more accurate one, we should be calculating the error wrt its `eps`. Anything wrong with this reasoning? The point you raise about ulp error of 2 vs 1 is very valid. We already it special case it one way. Should we special case the other way around as well? And may be that is why you are saying we should generalize these special cases with a symmetric algorithm to calculate the ulp error?
> Yes, it is the same problem with / eps(float(mpfrValue)), with input is 2^n and float(mpfrValue) is 2^n - (eps(x)/2).  So the only reasonable way that will return 1 ulp for both cases is to / min (eps(input), eps(float(mpfrValue)).  And actually these edge cases are the only time / min(eps, eps) gives different answers than / eps(input) or eps(float(mpfrValue).
> If we change to | input  - float(mpfrValue) | / eps(input), we more or less calculating | input_bitfield - float(mpfrValue)_bitfield |, so in my opinion it's better to use
> min (eps(input), eps(float(mpfrValue))), since that would avoid the case where float(mpfrValue) is 2^n and input is 2^n - (eps(x)/2) which is representable:
> - if we use the eps(input), the calculated ulp will be 2,
> - if we use min ( eps(input), eps(float(mpfrValue)) ), the calculated ulp will be 1.
> A concrete example is that ulp( input = float(1 - 2^(-24)), float(mpfrValue) = float(1) )

I want to discuss this example more. Assuming we are dealing with single precision floating point numbers and that `mpfrResult = float(actualMpfrResult)`:

```
input = float(1 - 2 ^ (-24))
mpfrResult = float(1);
eps(mpfrResult) = 2 ^ (-23)
eps(input) = 2 ^ (-24)

|mpfrResult - input| = 2 ^ (-24)
```

So, if we calculate ulp error wrt the input `eps`, then:

```
ulp = (2 ^ (-24)) / (2 ^ (-24)) = 1
```

If we calculate ulp error wrt `eps` of `mpfrResult`, then:

```
ulp = (2 ^ (-24)) / (2 ^ (-23)) = 1/2
```

Lets consider the other way around example:

```
input = float(1 + 2 ^ (-23))
mpfrResult = float(1);
eps(mpfrResult) = 2 ^ (-23)
eps(input) = 2 ^ (-23)

|mpfrResult - input| = 2 ^ (-23)
```

And so the ulp error will be 1 in whichever way we calculate.

Now, consider another example:

```
input = float(1 - 2 ^ (-24))
mpfrResult = float(1 + 2 ^ (-23));
eps(mpfrResult) = 2 ^ (-23)
eps(input) = 2 ^ (-24)

|mpfrResult - input| = 2 ^ (-23) + 2 ^ (-24)
```

So, if we calculate ulp error wrt the input `eps`, then:

```
ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-24)) = 3
```

If we calculate ulp error wrt `eps` of `mpfrResult`, then:

```
ulp = (2 ^ (-24) + 2 ^ (-23)) / (2 ^ (-23)) = 1.5
```

I think our goal should be to treat bit distances on either sides of `2^N` uniformly where `N = max(exp(input), exp(mpfrResult))`. Then:

```
N = max(exp(input), exp(mpfrResult))
eps_input = 2^(exp(input) - 23)
eps_mpfr = 2^(exp(mpfrResult - 23)
ulp = |2^N - input|/eps_input+ |2^N - mpfrResult|/eps_mpfr
```

I think this formulation not only has the symmetry property, but also corresponds to the bit distances for close enough results (which do not differ in the exponent by more than 1). For results farther apart, I don't think it matters. WDYT?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104615/new/

https://reviews.llvm.org/D104615