[libc-commits] [PATCH] D131095: [libc] Prevent overflow from intermediate results when adding UInt<N> values.

Thu Aug 4 07:37:07 PDT 2022

orex added inline comments.

================
Comment at: libc/src/__support/CPP/UInt.h:74
   constexpr uint64_t add(const UInt<Bits> &x) {
-    uint64_t carry = 0;
+    bool carry = false;
     for (size_t i = 0; i < WordCount; ++i) {
----------------
lntue wrote:
> orex wrote:
> > Hi, Tue!
> > 
> > I was thinking about the implementation. I little bit worries about the performance. To many `if` here. I would like to propose you another solution. Of course it is up to you to accept it or not. The strongest point of the solution is slightly reduced number of arithmetic operations and only one "main" if. The second one triggered very rare.
> > 
> > ```
> >     uint64_t carry = 0;
> >     for (size_t i = 0; i < WordCount; ++i) {
> >       // Will be wrapped if sum more than 2^(sizeof(x)) - 1
> >       val[i] += x.val[i];
> >       // If an overflow appears, the result is less than both of the initial
> >       // variables
> >       if (val[i] < x.val[i]) {
> >         // Add previous carry. Overflow is not possible.
> >         val[i] += carry;
> >         // Put 1 to the next digits.
> >         carry = 1;
> >       } else {
> >         val[i] += carry;
> >         // Likely no overflow.
> >         if (likely(val[i]) != 0)
> >           carry = 0;
> >         // else carry keeps value in case of carry = 0 it is simply 0 with
> >         // no overflow in case of 1 this made overflow and propagates next.
> >       }
> >     }
> >     return carry;
> >   }
> > ```
> > 
> Hi Kirill, thanks for thinking about improving the performance!  I should have added some more background to this patch.
> 
> The main reason I had to make it a bit complicated is that when using `UInt<128>` to replace `__uint128_t` (which we will need to do for targets without `__uint128_t` builtin supports), it failed some tests that check overflow flags, which are set by the intermediate computation such as `val[i] += x.val[i]` in your improvement or the previous implementation, while the overall `__uint128_t` addition is not overflowed.
> 
> Of course this change will make another issue pop up, that is now we don't set overflow flag/trap when the real sum in `__uint128_t` is overflowed.  That's why I left some todo for later patches.
Thank you for the reply. it was mistake, I should read the title more carefully. Your solution looks good and beautiful to solve the problem. 
I don't know the full context, but for x86 I can offer an alternative, just clear CF the flag with test command (if you are not interesting in other flags):
https://en.wikipedia.org/wiki/TEST_(x86_instruction)
Another proposed solution can be, something like this: 
1) calculate all sums, but last with the loop I proposed or with your previous solution, which is also very elegant.
2) clear the flag for some architectures. As I know, for example, RISC V do not have flags, so do not needed.
3) Simply add vals in the end.

So you can get "true overflow" flag in the end or you can clear the flag in the end, so the behavior will be the same.
But anyhow your solution is very elegant.
P.S. To be honest I don't see why you previous solution overflows. Sorry for bothering you, but It is very interesting for me? 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131095/new/

https://reviews.llvm.org/D131095