[PATCH] D13994: [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant
Joerg Sonnenberger via llvm-commits
llvm-commits at lists.llvm.org
Wed Nov 18 07:28:01 PST 2015
joerg added inline comments.
================
Comment at: lib/Transforms/Utils/SimplifyLibCalls.cpp:1088-1101
@@ +1087,16 @@
+ AddChain[19] = B.CreateFMul(AddChain[1], AddChain[18]);
+ AddChain[20] = B.CreateFMul(AddChain[10], AddChain[10]);
+ AddChain[21] = B.CreateFMul(AddChain[6], AddChain[15]);
+ AddChain[22] = B.CreateFMul(AddChain[11], AddChain[11]);
+ AddChain[23] = B.CreateFMul(AddChain[3], AddChain[20]);
+ AddChain[24] = B.CreateFMul(AddChain[12], AddChain[12]);
+ AddChain[25] = B.CreateFMul(AddChain[8], AddChain[17]);
+ AddChain[26] = B.CreateFMul(AddChain[13], AddChain[13]);
+ AddChain[27] = B.CreateFMul(AddChain[3], AddChain[24]);
+ AddChain[28] = B.CreateFMul(AddChain[14], AddChain[14]);
+ AddChain[29] = B.CreateFMul(AddChain[4], AddChain[25]);
+ AddChain[30] = B.CreateFMul(AddChain[15], AddChain[15]);
+ AddChain[31] = B.CreateFMul(AddChain[3], AddChain[28]);
+ AddChain[32] = B.CreateFMul(AddChain[16], AddChain[16]);
+ return AddChain[Exp];
+}
----------------
mgrang wrote:
> majnemer wrote:
> > You will end up creating unnecessary instructions if `Exp < 32`, please do not do this.
> Thanks David.
>
> My previous implementation used Binary Exponentiation but that results in more fmuls getting generated.
>
> Addition-Chain Exponentiation generates the optimal (least) no. of fmuls.
>
> Other reviewers were of the opinion that we should minimize fmuls (by precomputing the multiplication chains).
>
> I think it will always be a trade-off between runtime and code-size; and here we opted for optimal runtime.
>
Almost. There are two different ways to store the pre-computed table. The first is to store the summands, e.g. `15 = 12 + 3`. Advantage is that it needs exactly one entry per exponent, but downside is that you need to deduplicate, since 12 and 3 will use the same sub-expressions. A map and recursion is easiest approach for that.
The other approach is to just compile the list explicitly like `(0,0), (1,0), (2,2), (3,3), (4,2)`. Entry 0 is the input, every other entry is created in the loop. This version requires slightly more space in the binary, but allows using a simple for loop.
http://reviews.llvm.org/D13994
More information about the llvm-commits
mailing list