[PATCH] D60669: [APInt] Optimize umul_ov
Fangrui Song via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Apr 14 20:11:21 PDT 2019
MaskRay marked 2 inline comments as done.
MaskRay added a comment.
> Is this trying to address some performance problem? Any numbers?
I'm using a `-DBUILD_SHARED_LIBS=on` build so there is some indirection cost. Anyway, The performance improvement is obvious:
#include <cstdlib>
#include <llvm/ADT/APInt.h>
using namespace std;
#define FOR(i, a, b) for (remove_cv<remove_reference<decltype(b)>::type>::type i = (a); i < (b); i++)
#define REP(i, n) FOR(i, 0, n)
int main() {
const unsigned N = 64;
REP(i, 10000) {
unsigned a = rand(), b = rand();
REP(i, 40000) {
bool overflow;
llvm::APInt A(N, a), B(N, b);
auto C = A.umul_ov(B, overflow);
asm volatile("" : : "r"(C[0]), "r"(overflow) : "memory");
}
}
}
Before (two `udiv` calls; actually one suffices): `12.8129 +- 0.0952 seconds time elapsed ( +- 0.74% )`
After (leading zeros + mul + shift + plus): `3.6257 +- 0.0296 seconds time elapsed ( +- 0.82% )`
================
Comment at: lib/Support/APInt.cpp:1912
+ return APInt(BitWidth, (uint64_t)V);
+ }
+#endif
----------------
nikic wrote:
> In this form, this will only be correct if the bit-width is *exactly* 64-bit. For smaller bit-widths you'll have to consider a combination of the intrinsic overflow flag, and whether any bits beyond the bit-width are set. (It might make sense to also special-case <= 32 bits, in which case checking bits in the result of a normal multiply would be sufficient.)
Good catch. I forgot about that. Changed the purpose of this patch.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D60669/new/
https://reviews.llvm.org/D60669
More information about the llvm-commits
mailing list