<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Thu, Mar 12, 2015 at 2:03 PM, Francois Pichet <span dir="ltr"><<a href="mailto:pichet2000@gmail.com" target="_blank">pichet2000@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>I think it would make sense for (1) and (2). I am not sure if (3) is feasible in <span style="font-size:12.8000001907349px">instcombine</span>. (I am not too familiar with LoopInfo)</div></div></blockquote><div><br></div><div>LoopInfo should be available for at least one of the instcombine invocations. In the -O2 pipeline it looks like instcombine is called 7 times and it should have loopinfo for the third invocation (called immediate after some loop passes). Here's output from --debug-pass=Structure:</div><div><br></div><div><div>Pass Arguments: -tti -no-aa -<!--
-->tbaa -scoped-noalias -assumption-cache-tracker -targetlibinfo -basicaa -verify -simplifycfg -domtree -sroa -early-cse -lower-expect</div><div>Target Transform Information</div><div>No Alias Analysis (always returns 'may' alias)</div><div>Type-Based Alias Analysis</div><div>Scoped NoAlias Alias Analysis</div><div>Assumption Cache Tracker</div><div>Target Library Information</div><div>Basic Alias Analysis (stateless AA impl)</div><div> FunctionPass Manager</div><div> Module Verifier</div><div> Simplify the CFG</div><div> Dominator Tree Construction</div><div> SROA</div><div> Early CSE</div><div> Lower 'expect' Intrinsics</div><div>Pass Arguments: -targetlibinfo -tti -no-aa -tbaa -scoped-noalias -assumption-cache-tracker -basicaa -verify-di -ipsccp -globalopt -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh -inline-cost -i</div><div>nline -functionattrs -sroa -domtree -early-cse -lazy-value-info -jump-threading -<!--
-->correlated-propagation -simplifycfg -domtree -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lc</div><div>ssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -mldst-motion -domtree -memdep -gvn -memdep -memcpyopt -sccp -dom</div><div>tree -bdce -instcombine -lazy-value-info -jump-threading -correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa -licm -adce -simplifycfg -domtree -instcombine -barrier -domtree -loops -loop-sim</div><div>plify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution -loop-accesses -loop-vectorize -instcombine -scalar-evolution -slp-vectorizer -simplifycfg -domtree -instcombine -loops -loop-simplify -lcssa -s</div><div>calar-evolution -loop-unroll -alignment-from-assumptions -strip-dead-prototypes -globaldce -constmerge -verify -verify-di</div><div>Target Library Information</div><div>Target <!--
-->Transform Information</div><div>No Alias Analysis (always returns 'may' alias)</div><div>Type-Based Alias Analysis</div><div>Scoped NoAlias Alias Analysis</div><div>Assumption Cache Tracker</div><div>Basic Alias Analysis (stateless AA impl)</div><div> ModulePass Manager</div><div> Debug Info Verifier</div><div> Interprocedural Sparse Conditional Constant Propagation</div><div> Global Variable Optimizer</div><div> Dead Argument Elimination</div><div> FunctionPass Manager</div><div> Dominator Tree Construction</div><div> <b>Combine redundant instructions</b></div><div> Simplify the CFG</div><div> CallGraph Construction</div><div> Call Graph SCC Pass Manager</div><div> Remove unused exception handling info</div><div> Inline Cost Analysis</div><div> Function Integration/Inlining</div><div> Deduce function attributes</div><div> FunctionPass Manager</div><div> SROA<!--
--></div><div> Dominator Tree Construction</div><div> Early CSE</div><div> Lazy Value Information Analysis</div><div> Jump Threading</div><div> Value Propagation</div><div> Simplify the CFG</div><div> Dominator Tree Construction</div><div> <b>Combine redundant instructions</b></div><div> Tail Call Elimination</div><div> Simplify the CFG</div><div> Reassociate expressions</div><div> Dominator Tree Construction</div><div> Natural Loop Information</div><div> Canonicalize natural loops</div><div> Loop-Closed SSA Form Pass</div><div> Loop Pass Manager</div><div> Rotate Loops</div><div> Loop Invariant Code Motion</div><div> Unswitch loops</div></div><div><div> <b>Combine redundant instructions</b></div><div> Scalar Evolution Analysis</div><div> <!--
--> Canonicalize natural loops</div><div> Loop-Closed SSA Form Pass</div><div> Loop Pass Manager</div><div> Induction Variable Simplification</div><div> Recognize loop idioms</div><div> Delete dead loops</div><div> Unroll loops</div><div> Memory Dependence Analysis</div><div> MergedLoadStoreMotion</div><div> Dominator Tree Construction</div><div> Memory Dependence Analysis</div><div> Global Value Numbering</div><div> Memory Dependence Analysis</div><div> MemCpy Optimization</div><div> Sparse Conditional Constant Propagation</div><div> Dominator Tree Construction</div><div> Bit-Tracking Dead Code Elimination</div><div> <b>Combine redundant instructions</b></div><div> Lazy Value Information Analysis</div><div> Jump Threading</div><div> Value <!--
-->Propagation</div><div> Dominator Tree Construction</div><div> Memory Dependence Analysis</div><div> Dead Store Elimination</div><div> Natural Loop Information</div><div> Canonicalize natural loops</div><div> Loop-Closed SSA Form Pass</div><div> Loop Pass Manager</div><div> Loop Invariant Code Motion</div><div> Aggressive Dead Code Elimination</div><div> Simplify the CFG</div><div> Dominator Tree Construction</div><div> <b>Combine redundant instructions</b></div><div> A No-Op Barrier Pass</div><div> FunctionPass Manager</div><div> Dominator Tree Construction</div><div> Natural Loop Information</div><div> Canonicalize natural loops</div><div> Loop-Closed SSA Form Pass</div><div> Loop Pass Manager</div><div> Rotate Loops</div><div> Branch Probability Analysis</div><div> <!--
-->Block Frequency Analysis</div><div> Scalar Evolution Analysis</div><div> Loop Access Analysis</div><div> Loop Vectorization</div><div> <b>Combine redundant instructions</b></div><div> Scalar Evolution Analysis</div><div> SLP Vectorizer</div><div> Simplify the CFG</div><div> Dominator Tree Construction</div><div> <b>Combine redundant instructions</b></div><div> Natural Loop Information</div><div> Canonicalize natural loops</div><div> Loop-Closed SSA Form Pass</div><div> Scalar Evolution Analysis</div><div> Loop Pass Manager</div><div> Unroll loops</div><div> Alignment from assumptions</div></div><div><div> Strip Unused Function Prototypes</div><div> Dead Global Elimination</div><div> Merge Duplicate Global Constants</div><div> FunctionPass Manager</div><div> Module Verifier</div><div> Debug Info Verifier</div><div> <!--
--> Bitcode Writer</div></div><div><br></div><div><br></div><div><br></div><div>Mark</div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><div>For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork to:</div><div><br></div><div><span><div> if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&</div><div> !Src.hasOneUse())</div><div> return false;</div><div><br></div></span><div> return Src.hasAllConstantIndices(); // was return false;<br></div></div><div><br></div><div>Following that change, I noticed some performance gain for a few specific tests and no regression at all in our (admittedly limited) benchmarks suite.</div><div><br></div><div><span style="font-size:12.8000001907349px">Regards,</span></div><div>Francois Pichet, Octasic.</div><div><div><!--
--><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 12, 2015 at 4:14 PM, Mark Heffernan <span dir="ltr"><<a href="mailto:meheff@google.com" target="_blank">meheff@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Coincidentally, I just ran into this same issue on some of our benchmarks for the NVPTX backend. You have something like this before instcombine:<div><br></div><div> %tmp = getelementptr inbounds i32, i32* %input, i64 %offset<br></div><div>loop:<br></div><div> %loop_variant = ...</div><div> %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant</div><div><br></div><div>Which gets transformed to:</div><div><br></div><div><!--
--><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">loop:<br></div><div style="color:rgb(34,34,34);font-family:arial,sans-serif;font-size:small;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"> %loop_variant = ...</div><div style="text-align:start;text-indent:0px"> %sum = add nsw i64 %loop_variant, %offset<br></div><div style="text-align:start;text-indent:0px"> %ptr = getelementptr inbounds i32, i32* %input, i64 %sum<br></div><div style="text-align:start;text-indent:0px"><br></div><div style="text-align:start;text-indent:0px">The merge essentially reassociates the loop-variant term (%loop_variant) and loop-invariant terms (%<!--
-->input and %offset) in such a way that LICM can't remove it.<br></div></div><div><div><br></div><div>One idea is to only perform this style of gep merge if at least one of the following conditions is true:</div><div>(1) both index terms in the GEP are constant. In this case no new add instruction is created, instead the constants are folded.</div><div>(2) the GEPs are in the same BB.</div><div>(3) LoopInfo is available, and we know we're not creating a new instruction in a (deeper) loop.<br></div><div><br></div><div>What do you think?</div><span><font color="#888888"><div><br></div><div>Mark</div><div><br></div></font></span></div></div>
</blockquote></div><br></div></div></div></div>
</blockquote></div><br></div></div>