[libc-commits] [PATCH] D131095: [libc] Prevent overflow from intermediate results when adding UInt<N> values.

Thu Aug 4 12:00:55 PDT 2022

lntue added inline comments.

================
Comment at: libc/src/__support/CPP/UInt.h:74
   constexpr uint64_t add(const UInt<Bits> &x) {
-    uint64_t carry = 0;
+    bool carry = false;
     for (size_t i = 0; i < WordCount; ++i) {
----------------
orex wrote:
> lntue wrote:
> > orex wrote:
> > > Hi, Tue!
> > > 
> > > I was thinking about the implementation. I little bit worries about the performance. To many `if` here. I would like to propose you another solution. Of course it is up to you to accept it or not. The strongest point of the solution is slightly reduced number of arithmetic operations and only one "main" if. The second one triggered very rare.
> > > 
> > > ```
> > >     uint64_t carry = 0;
> > >     for (size_t i = 0; i < WordCount; ++i) {
> > >       // Will be wrapped if sum more than 2^(sizeof(x)) - 1
> > >       val[i] += x.val[i];
> > >       // If an overflow appears, the result is less than both of the initial
> > >       // variables
> > >       if (val[i] < x.val[i]) {
> > >         // Add previous carry. Overflow is not possible.
> > >         val[i] += carry;
> > >         // Put 1 to the next digits.
> > >         carry = 1;
> > >       } else {
> > >         val[i] += carry;
> > >         // Likely no overflow.
> > >         if (likely(val[i]) != 0)
> > >           carry = 0;
> > >         // else carry keeps value in case of carry = 0 it is simply 0 with
> > >         // no overflow in case of 1 this made overflow and propagates next.
> > >       }
> > >     }
> > >     return carry;
> > >   }
> > > ```
> > > 
> > Hi Kirill, thanks for thinking about improving the performance!  I should have added some more background to this patch.
> > 
> > The main reason I had to make it a bit complicated is that when using `UInt<128>` to replace `__uint128_t` (which we will need to do for targets without `__uint128_t` builtin supports), it failed some tests that check overflow flags, which are set by the intermediate computation such as `val[i] += x.val[i]` in your improvement or the previous implementation, while the overall `__uint128_t` addition is not overflowed.
> > 
> > Of course this change will make another issue pop up, that is now we don't set overflow flag/trap when the real sum in `__uint128_t` is overflowed.  That's why I left some todo for later patches.
> Thank you for the reply. it was mistake, I should read the title more carefully. Your solution looks good and beautiful to solve the problem. 
> I don't know the full context, but for x86 I can offer an alternative, just clear CF the flag with test command (if you are not interesting in other flags):
> https://en.wikipedia.org/wiki/TEST_(x86_instruction)
> Another proposed solution can be, something like this: 
> 1) calculate all sums, but last with the loop I proposed or with your previous solution, which is also very elegant.
> 2) clear the flag for some architectures. As I know, for example, RISC V do not have flags, so do not needed.
> 3) Simply add vals in the end.
> 
> So you can get "true overflow" flag in the end or you can clear the flag in the end, so the behavior will be the same.
> But anyhow your solution is very elegant.
> P.S. To be honest I don't see why you previous solution overflows. Sorry for bothering you, but It is very interesting for me? 
> 
Thanks!  We can definitely have followups with performance enhancement per architecture targets once this class is used in more places, together with more comprehensive testings.

Re P.S.:  Sorry I didn't dig in too much to see why it overflowed.  It's possible that some compiler optimizations were too aggressive recognizing that the operations could be done in 32-bits and used such; or they recognized the pattern and reduced it to `uint64_t` adds with carry bit checks?  Anyhow, I plan to use this safe implementation as our baseline and extend it to subtraction next.  We definitely should have specialized versions with builtins / clear flags for x86_64 and aarch64 to improve the performance, as we are going to use more than 128-bit integers very soon.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131095/new/

https://reviews.llvm.org/D131095