[libc-commits] [PATCH] D127046: [libc][math] fmod/fmodf implementation.

Fri Jun 24 01:47:09 PDT 2022

orex added a comment.

Thank you Tue, for your comments. They was really useful. I'll really appreciate if you go through all my replies. It was a lot of them, so I'm afraid that I can miss something or my changes can be improved even more. I'll also appreciate if you check the performance for fmod in the same way, as other functions. I've attached the file F23566437: fmod-core-math.tar.gz <https://reviews.llvm.org/F23566437>. You should simply unpack it inside core-math folder.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:31
+//    Input x,y in the  algorithm is represented (mathematically) like hx * 2^ix
+//    and hy * 2^iy. This is an unambiguous number representation. For example:
+//      h * 2^i = (2 * h) * 2^(i-1)
----------------
lntue wrote:
> Nit: actually without extra conditions on `hx, hy, ix, iy`, the representation `x = hx * 2^ix` is ambiguous (i.e., not unique), just from your example.  It is only unambiguous if `hx, hy, ix, iy` are integers and `hx, hy` are odd, i.e., not divisible by 2.
Thank you! Just a mistake "unambiguous" -> "ambiguous"

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:32-68
+//      h * 2^i = (2 * h) * 2^(i-1)
+//    The algorithm uses the fact that
+//      r = a % b = (a % (N * b)) % b,
+//    where N is positive  integer number. a and b - positive. Let's  adopt the
+//    formula for representation above.
+//      a = hx * 2^ix, b = hy * 2^iy, N = 2^k
+//      r(k) = a % b = (hx * 2^ix) % (2 ^k * hy * 2^iy)
----------------
lntue wrote:
> So here is my understanding of the algorithm and the math behind it, please correct me if I'm wrong:
> 
> The two main properties about the (integer) modulus operation, denoted by `mod` or `%`, that we will use are:
> For any positive integers `a, b, c`:
> ```
>   1) a mod b = (a mod (b * c)) mod b
>   2) (a * c) mod (b * c) = (a mod b) * c
> ```
> 
> First, let write `x = hx * 2^ix` and `y = hy * 2^iy` with `hx, hy, ix, iy` are integers (assumed to be positive for simplification).
> Then the naive implementation of the `fmod` function with a simple `for/while` loop:
> ```
> while (ix > iy) {
>   hx = (hx * 2) % hy;
>   --ix;
> }
> ```
> is mathematically equivalent to:
> ```
> x mod y = (hx * 2^ix) mod (hy * 2^iy)
>         = ((hx * 2^(ix - iy)) mod hy) * 2^iy     (apply property 2)
>         = (( ... (((hx * 2^(ix - iy)) mod (hy * 2^(ix - iy - 1))) mod (hy * 2^(ix - iy - 2)) ... ) mod hy) * 2^iy     (apply property 1 repeatedly)
>         = (( ... (( (hx * 2) mod hy) * 2^(ix - iy - 1)) mod (hy * 2^(ix - iy - 2)) ... ) mod hy) * 2^iy     (apply property 2)
>         = (( ... (( (hx * 2) mod hy) * 2 ) mod hy ) ... ) * 2) mod hy) * 2^iy     (apply property 2 repeatedly)
> ```
> And the total number of iterations is `ix - iy`.
> 
> On the other hand, your algorithm exploits the fact that hx, hy are the mantissas of floating point numbers, which use less bits than the storage integers: 24 / 32 for floats and 53 / 64 for doubles, so if in each step of the iteration, we can left shift `hx` as many bits as the storage integer type can hold, the exponent reduction per step will be at least 32 - 24 = 8 for floats and 64 - 53 = 9 for doubles:
> ```
> x mod y = (hx * 2^ix) mod (hy * 2^iy)
>         = ((hx * 2^(ix - iy) mod hy) * 2^iy
>         = (( ... (( (( hx * 2^r1) mod hy ) * 2^r2) mod hy ) ... ) * 2^rk) mod hy) * 2^iy
> ```
> where `r1 + r2 + ... + rk = ix - iy` and `ri >= sizeof(UInt) - hy length` for `i = 1..k`.
> 
> And so the number of iterations is at most by: `(ix - iy) / (sizeof(UInt) - mantissa_length)`.
> 
> Feel free to use this to update the comments if it makes sense.
Thank you! Your explanation is very good. I included some sentences to the description. Huge math inserts looks absolutely unreadable in text format, so I skip them. 
Nit: Some preparations step requires for this loop to work. Also this loop can be improved, changing division by subtraction.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:116
+public:
+  inline constexpr static intU_t execute(int n, int hyltzeroes, intU_t hx,
+                                         intU_t hy) {
----------------
lntue wrote:
> `exp_diff` for `n` and `max_shift` for `hyltzeroes` or other more descriptive names.
Done. Thank you!

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:214
+    int tzhy = ctz(hy);
+    int hyltzeroes = lzhy + tzhy;
+    // n > 0 byt conditions above
----------------
lntue wrote:
> `max_shift` / `max_scale_factor` maybe more descriptive?
Changed to `sides_zeroes_count`.

================
Comment at: libc/src/__support/FPUtil/generic/FMod.h:236-239
+    if (hx == 0)
+      return FPB::zero();
+
+    if (n == 0)
----------------
lntue wrote:
> Does marking these 2 conditions `unlikely` improve throughput?
n == 0 is quite likely condition, I think. But for hx, yes. You are right.

================
Comment at: libc/test/src/math/FModTest.h:36
+  void testSpecialNumbers(FModFunc f) {
+    using nl = std::numeric_limits<T>;
+
----------------
lntue wrote:
> Use `NumericLimits` template from `libc/src/__support/CPP/Limits.h`.
Unfortunately the library is very primitive. It does not have things, I need.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D127046/new/

https://reviews.llvm.org/D127046