[PATCH] D13994: [SimplifyLibCalls] Optimization for pow(x, n) where n is some constant

Wed Nov 18 07:28:01 PST 2015

joerg added inline comments.

================
Comment at: lib/Transforms/Utils/SimplifyLibCalls.cpp:1088-1101
@@ +1087,16 @@
+  AddChain[19] = B.CreateFMul(AddChain[1],  AddChain[18]);
+  AddChain[20] = B.CreateFMul(AddChain[10], AddChain[10]);
+  AddChain[21] = B.CreateFMul(AddChain[6],  AddChain[15]);
+  AddChain[22] = B.CreateFMul(AddChain[11], AddChain[11]);
+  AddChain[23] = B.CreateFMul(AddChain[3],  AddChain[20]);
+  AddChain[24] = B.CreateFMul(AddChain[12], AddChain[12]);
+  AddChain[25] = B.CreateFMul(AddChain[8],  AddChain[17]);
+  AddChain[26] = B.CreateFMul(AddChain[13], AddChain[13]);
+  AddChain[27] = B.CreateFMul(AddChain[3],  AddChain[24]);
+  AddChain[28] = B.CreateFMul(AddChain[14], AddChain[14]);
+  AddChain[29] = B.CreateFMul(AddChain[4],  AddChain[25]);
+  AddChain[30] = B.CreateFMul(AddChain[15], AddChain[15]);
+  AddChain[31] = B.CreateFMul(AddChain[3],  AddChain[28]);
+  AddChain[32] = B.CreateFMul(AddChain[16], AddChain[16]);
+  return AddChain[Exp];
+}
----------------
mgrang wrote:
> majnemer wrote:
> > You will end up creating unnecessary instructions if `Exp < 32`, please do not do this.
> Thanks David.
> 
> My previous implementation used Binary Exponentiation but that results in more fmuls getting generated.
> 
> Addition-Chain Exponentiation generates the optimal (least) no. of fmuls.
> 
> Other reviewers were of the opinion that we should minimize fmuls (by precomputing the multiplication chains).
> 
> I think it will always be a trade-off between runtime and code-size; and here we opted for optimal runtime.
> 
Almost. There are two different ways to store the pre-computed table. The first is to store the summands, e.g. `15 = 12 + 3`. Advantage is that it needs exactly one entry per exponent, but downside is that you need to deduplicate, since 12 and 3 will use the same sub-expressions. A map and recursion is easiest approach for that.

The other approach is to just compile the list explicitly like `(0,0), (1,0), (2,2), (3,3), (4,2)`. Entry 0 is the input, every other entry is created in the loop. This version requires slightly more space in the binary, but allows using a simple for loop.

http://reviews.llvm.org/D13994