[PATCH] D60669: [APInt] Optimize umul_ov

Fangrui Song via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sun Apr 14 20:11:21 PDT 2019


MaskRay marked 2 inline comments as done.
MaskRay added a comment.

> Is this trying to address some performance problem? Any numbers?

I'm using a `-DBUILD_SHARED_LIBS=on` build so there is some indirection cost. Anyway, The performance improvement is obvious:

  #include <cstdlib>
  #include <llvm/ADT/APInt.h>
  using namespace std;
  
  #define FOR(i, a, b) for (remove_cv<remove_reference<decltype(b)>::type>::type i = (a); i < (b); i++)
  #define REP(i, n) FOR(i, 0, n)
  
  int main() {
    const unsigned N = 64;
    REP(i, 10000) {
      unsigned a = rand(), b = rand();
      REP(i, 40000) {
        bool overflow;
        llvm::APInt A(N, a), B(N, b);
        auto C = A.umul_ov(B, overflow);
        asm volatile("" : : "r"(C[0]), "r"(overflow) : "memory");
      }
    }
  }

Before (two `udiv` calls; actually one suffices): `12.8129 +- 0.0952 seconds time elapsed  ( +-  0.74% )`
After (leading zeros + mul + shift + plus): `3.6257 +- 0.0296 seconds time elapsed  ( +-  0.82% )`



================
Comment at: lib/Support/APInt.cpp:1912
+    return APInt(BitWidth, (uint64_t)V);
+  }
+#endif
----------------
nikic wrote:
> In this form, this will only be correct if the bit-width is *exactly* 64-bit. For smaller bit-widths you'll have to consider a combination of the intrinsic overflow flag, and whether any bits beyond the bit-width are set. (It might make sense to also special-case <= 32 bits, in which case checking bits in the result of a normal multiply would be sufficient.)
Good catch. I forgot about that. Changed the purpose of this patch.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D60669/new/

https://reviews.llvm.org/D60669





More information about the llvm-commits mailing list