[libc-commits] [PATCH] D127046: [libc][math] fmod/fmodf implementation.

Thu Jun 23 10:08:34 PDT 2022

lntue added a comment.

Thanks for your patient and sorry it took a bit too long for reviewing!  Just a few nits and it's good to go.

Also feel free to sync and add `fmod` to the math status page at `libc/docs/math.rst`.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:25
+//    Some common  cases, like  abs(x) < abs(y)  or  abs(x) < 1000 *  abs(y) are
+//    treated specially to increase  performance.  The part of checking  special
+//    cases, numbers NaN, INF etc. is inspired by GCC libm implementation.
----------------
orex wrote:
> lntue wrote:
> > nit: treated *separately*
> Sure.
Nit: you don't need to wrap the word separately in *'s, I just use it to highlight my suggestion.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:31
+//    Input x,y in the  algorithm is represented (mathematically) like hx * 2^ix
+//    and hy * 2^iy. This is an unambiguous number representation. For example:
+//      h * 2^i = (2 * h) * 2^(i-1)
----------------
Nit: actually without extra conditions on `hx, hy, ix, iy`, the representation `x = hx * 2^ix` is ambiguous (i.e., not unique), just from your example.  It is only unambiguous if `hx, hy, ix, iy` are integers and `hx, hy` are odd, i.e., not divisible by 2.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:32-68
+//      h * 2^i = (2 * h) * 2^(i-1)
+//    The algorithm uses the fact that
+//      r = a % b = (a % (N * b)) % b,
+//    where N is positive  integer number. a and b - positive. Let's  adopt the
+//    formula for representation above.
+//      a = hx * 2^ix, b = hy * 2^iy, N = 2^k
+//      r(k) = a % b = (hx * 2^ix) % (2 ^k * hy * 2^iy)
----------------
So here is my understanding of the algorithm and the math behind it, please correct me if I'm wrong:

The two main properties about the (integer) modulus operation, denoted by `mod` or `%`, that we will use are:
For any positive integers `a, b, c`:
```
  1) a mod b = (a mod (b * c)) mod b
  2) (a * c) mod (b * c) = (a mod b) * c
```

First, let write `x = hx * 2^ix` and `y = hy * 2^iy` with `hx, hy, ix, iy` are integers (assumed to be positive for simplification).
Then the naive implementation of the `fmod` function with a simple `for/while` loop:
```
while (ix > iy) {
  hx = (hx * 2) % hy;
  --ix;
}
```
is mathematically equivalent to:
```
x mod y = (hx * 2^ix) mod (hy * 2^iy)
        = ((hx * 2^(ix - iy)) mod hy) * 2^iy     (apply property 2)
        = (( ... (((hx * 2^(ix - iy)) mod (hy * 2^(ix - iy - 1))) mod (hy * 2^(ix - iy - 2)) ... ) mod hy) * 2^iy     (apply property 1 repeatedly)
        = (( ... (( (hx * 2) mod hy) * 2^(ix - iy - 1)) mod (hy * 2^(ix - iy - 2)) ... ) mod hy) * 2^iy     (apply property 2)
        = (( ... (( (hx * 2) mod hy) * 2 ) mod hy ) ... ) * 2) mod hy) * 2^iy     (apply property 2 repeatedly)
```
And the total number of iterations is `ix - iy`.

On the other hand, your algorithm exploits the fact that hx, hy are the mantissas of floating point numbers, which use less bits than the storage integers: 24 / 32 for floats and 53 / 64 for doubles, so if in each step of the iteration, we can left shift `hx` as many bits as the storage integer type can hold, the exponent reduction per step will be at least 32 - 24 = 8 for floats and 64 - 53 = 9 for doubles:
```
x mod y = (hx * 2^ix) mod (hy * 2^iy)
        = ((hx * 2^(ix - iy) mod hy) * 2^iy
        = (( ... (( (( hx * 2^r1) mod hy ) * 2^r2) mod hy ) ... ) * 2^rk) mod hy) * 2^iy
```
where `r1 + r2 + ... + rk = ix - iy` and `ri >= sizeof(UInt) - hy length` for `i = 1..k`.

And so the number of iterations is at most by: `(ix - iy) / (sizeof(UInt) - mantissa_length)`.

Feel free to use this to update the comments if it makes sense.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:116
+public:
+  inline constexpr static intU_t execute(int n, int hyltzeroes, intU_t hx,
+                                         intU_t hy) {
----------------
`exp_diff` for `n` and `max_shift` for `hyltzeroes` or other more descriptive names.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:178
+
+    int ix = sx.get_unbiased_exponent();
+    int iy = sy.get_unbiased_exponent();
----------------
Maybe using `mx, ex` or `m_x, e_x` which are closer to mantissa and exponent of `x`?

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:209
+      hy = sy.get_mantissa();
+      lzhy = clz(hy);
+    }
----------------
Use more descriptive names such as `lead_zeros_hy`, `tail_zeros_hy`?

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:214
+    int tzhy = ctz(hy);
+    int hyltzeroes = lzhy + tzhy;
+    // n > 0 byt conditions above
----------------
`max_shift` / `max_scale_factor` maybe more descriptive?

================
Comment at: libc/test/src/math/exhaustive/CMakeLists.txt:189
+add_fp_unittest(
+  FMod_test
+  NO_RUN_POSTBUILD
----------------
Use lower case for the target name `fmod_test`.

================
Comment at: libc/test/src/math/exhaustive/CMakeLists.txt:195
+  SRCS
+    FMod_test.cpp
+  DEPENDS
----------------
Use lower case for the test file.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127046/new/

https://reviews.llvm.org/D127046