[PATCH] D60669: [APInt] Optimize umul_ov

Sun Apr 14 20:11:21 PDT 2019

MaskRay marked 2 inline comments as done.
MaskRay added a comment.

> Is this trying to address some performance problem? Any numbers?

I'm using a `-DBUILD_SHARED_LIBS=on` build so there is some indirection cost. Anyway, The performance improvement is obvious:

  #include <cstdlib>
  #include <llvm/ADT/APInt.h>
  using namespace std;

  #define FOR(i, a, b) for (remove_cv<remove_reference<decltype(b)>::type>::type i = (a); i < (b); i++)
  #define REP(i, n) FOR(i, 0, n)

  int main() {
    const unsigned N = 64;
    REP(i, 10000) {
      unsigned a = rand(), b = rand();
      REP(i, 40000) {
        bool overflow;
        llvm::APInt A(N, a), B(N, b);
        auto C = A.umul_ov(B, overflow);
        asm volatile("" : : "r"(C[0]), "r"(overflow) : "memory");
      }
    }
  }

Before (two `udiv` calls; actually one suffices): `12.8129 +- 0.0952 seconds time elapsed  ( +-  0.74% )`
After (leading zeros + mul + shift + plus): `3.6257 +- 0.0296 seconds time elapsed  ( +-  0.82% )`

================
Comment at: lib/Support/APInt.cpp:1912
+    return APInt(BitWidth, (uint64_t)V);
+  }
+#endif
----------------
nikic wrote:
> In this form, this will only be correct if the bit-width is *exactly* 64-bit. For smaller bit-widths you'll have to consider a combination of the intrinsic overflow flag, and whether any bits beyond the bit-width are set. (It might make sense to also special-case <= 32 bits, in which case checking bits in the result of a normal multiply would be sufficient.)
Good catch. I forgot about that. Changed the purpose of this patch.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60669/new/

https://reviews.llvm.org/D60669