<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial">Hi all, <div><br></div><div>I have summarized the top 10 compiler passes for Polly when compiling LLVM test-ssuite. Results can be viewed on:</div><div> <a href="https://gist.github.com/tanstar/581bcea1e4e03498f935/raw/f6a4ec4e8565f7a7bbdb924cd59fcf145caac039/Polly-top10" target="_blank">https://gist.github.com/tanstar/581bcea1e4e03498f935/raw/f6a4ec4e8565f7a7bbdb924cd59fcf145caac039/Polly-top10</a></div><div><br></div><div>Based on the comparison between "clang -O3" and "polly -O3" listed on:</div><div> <a href="http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=14&baseline=14">http://188.40.87.11:8000/db_default/v4/nts/18?compare_to=14&baseline=14</a></div><div><br></div><div>we can see Polly's compile-time overhead is mainly resulted by some expensive Polly passes such as PollyDependence, PollyOptimization and PollyCodegen. <span style="f!
ont-size: 14px; line-height: 1.7;">Especially, I notice that the PollyDependence can lead to significant extra compile-time overhead. Its compile-time percentage for some expensive benchmarks can be summarized as:</span></div><div><div> nestedloop: 41.4% (Polly - Calculate dependence)</div><div><span style="line-height: 1.7;"> salsa20: 98.5%</span><span style="line-height: 1.7;"> </span><span style="line-height: 1.7;">(Polly - Calculate dependence)</span></div><div> seidel-2d: 72.1% (Polly - Calculate dependence)</div><div> multiplies: 54.3% (Poly - <span style="line-height: 1.7;"> </span><span style="line-height: 1.7;">Calculate dependence)</span></div><div><span style="line-height: 1.7;"> Puzzle: 22.8% (</span><span style="line-height: 1.7;">Poly - </span><span style="line-height: 1.7;"> </span><span s!
tyle="line-height: 1.7;">Calculate dependence)</span></div></div><div>
<span style="line-height: 1.7;"><br></span></div><div>As a result, it is critical to improve the PollyDependence pass. I have previously committed a patch file to split start value from SCEVAddRecExpr (r187728), which can improve the PollyDependence pass for some benchmarks as follows:</div><div> <span style="font-size: 14px; line-height: 1.7;"> http://188.40.87.11:8000/db_default/v4/nts/25?baseline=18&compare_to=18</span></div><div><span style="font-size: 14px; line-height: 1.7;">Obviously, it is not enough.</span></div><div><span style="font-size: 14px; line-height: 1.7;"><br></span></div><div>I have investigated <span style="font-size: 14px; line-height: 1.7;"> </span><span style="font-size: 14px; line-height: 1.7;">the salsa20 benchmark, in which the PollyDependence pass consumes 98.5% of total compile-time. The key code of salsa20 benchmark consists of a serial of array operations:</span></div><div><div> #define R(a,b) (((a) &!
lt;< (b)) | ((a) >> (32 - (b))))</div><div><div> for (i = 20;i > 0;i -= 2) {</div><div> x[ 4] ^= R(x[ 0]+x[12], 7); x[ 8] ^= R(x[ 4]+x[ 0], 9);</div><div> x[12] ^= R(x[ 8]+x[ 4],13); x[ 0] ^= R(x[12]+x[ 8],18);</div><div> ....</div><div> x[14] ^= R(x[13]+x[12],13); x[15] ^= R(x[14]+x[13],18);</div><div> }</div></div></div><div>which would be translated into a serial of affine functions:</div><div><div> ReadAccess := <span style="font-size: 14px; line-height: 1.7;"> { Stmt_for_body5[i0] -> MemRef_x[12] };</span></div><div> ReadAccess := <span style="font-size: 14px; line-height: 1.7;"> { Stmt_for_body5[i0] -> MemRef_x[4] };</span></div><div> MustWriteAccess := <span style="font-size: 14px; line-height: 1.7;"> { Stmt_for_body5[i0] -> MemRef_x[4] };</span></div><div>!
ReadAccess := <span style="font-size: 14px;
line-height: 1.7;"> { Stmt_for_body5[i0] -> MemRef_x[0] };</span></div></div><div><span style="font-size: 14px; line-height: 1.7;"> ...</span></div><div>Consequently, the PollyDependence pass would produce very complex Read/Write/MayWrite as follows:</div><div><br></div><div><div> Read: { Stmt_for_body[i0] -> MemRef_in[i0] : i0 >= 0 and i0 <= 15; Stmt_for_body5[i0] -> MemRef_x[15] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[9] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[4] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRe!
f_x[11] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[2] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[6] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[14] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[8] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[10] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and !
i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[0] : (i0 >= 0 and i0
<= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[13] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[1] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[3] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[7] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i0] -> MemRef_x[12] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9); Stmt_for_body5[i!
0] -> MemRef_x[5] : (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) or (i0 >= 0 and i0 <= 9) }</div><div> Write: ...</div></div><div><br></div><div>As we can see, the reason why PollyDependence leads to expensive compile-time lies in the very complex Read/Write/MayWrite structure. In fact, we can run <span style="white-space: pre-wrap; font-size: 14px; line-height: 1.7;">isl_union_map_coalesce() on the </span><span style="white-space: pre-wrap; font-size: 14px; line-height: 1.7;">Read/MayWrite/MustWrite before feeding them into ISL analysis:</span></div><div><span style="white-space: pre-wrap; font-size: 14px; line-height: 1.7;"><br></span></div><div><span style="white-space: pre-wrap;">diff --git a/lib/Analysis/Dependences.cpp b/lib/Analysis/Dependences.cpp
index 9f918f3..39c3fb6 100644
--- a/lib/Analysis/Dependences.cpp
+++ b/lib/Analysis/Dependences.cpp
@@ -95,6 +95,10 @@ void Dependences::calculateDependences(Scop &S) {
collectInfo(S, &Read, &Write, &MayWrite, &Schedule);
+ Read = isl_union_map_coalesce(Read);
+ Write = isl_union_map_coalesce(Write);
+ MayWrite = isl_union_map_coalesce(MayWrite);
+
DEBUG(dbgs() << "Read: " << Read << "\n"</span></div><div><span style="white-space: pre-wrap;"><br></span></div><div><span style="white-space: pre-wrap; font-size: 14px; line-height: 1.7;">With this patch file, we can reduce the compile-time percentage of PollyDependence from 98.5% to 15.3%. </span><span style="font-size: 14px; line-height: 1.7; white-space: pre-wrap;">Unfortunately, the compile-time percentage of PollyDependence for benchmarks, such as "</span><span style="font-size: 14px; line-height: 1.7;">nestedloop", is still very high. </span></div><div><span style="font-size: 14px; line-height: 1.7;"><br></span></div><div>My plan is to continue investigating why PollyDependence pass leads to such high compile-time overhead. Could anyone give me some comments or suggestions?</div><div><br></div><div>Thanks,</div><div>Star Tan</div><div><span style="font-size: 14px; line-height: 1.7;"><br></span></div></div>