<div dir="ltr">Thanks all of you for the input! I'm learning a lot about ScalarEvolution. <br><br>I understand, even in the NVPTX64 backend which doesn't natively support i64 but uses 64-bit pointers, indvar widening can still be beneficial. However, I chose to disable widening all together because it's hard to evaluate the benefits of indvar widening before really trying it. For instance, if we widen the indvar and found the widening doesn't improve LSR (which runs as a backend IR pass), it's too late for us to rollback and the widened indvar costs at least one extra 32-bit addition when being incremented. That said, it seems hard to reach a uniformly optimal solution here, so I'd probably leave the compiler as is and resort to source code changes. <div><div><br></div><div>Jingyue</div><br><div class="gmail_quote">On Mon Dec 15 2014 at 11:38:42 PM Andrew Trick <<a href="mailto:atrick@apple.com" target="_blank">atrick@apple.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><br><div><blockquote type="cite"><div>On Dec 4, 2014, at 1:39 PM, Jingyue Wu <<a href="mailto:jingyue@google.com" target="_blank">jingyue@google.com</a>> wrote:</div><br><div><div>I was chasing down a performance regression on a CUDA benchmark compiled for the NVPTX64 backend, and found loop strength reduction is ineffective in the presence of sign extension. Here's a reduced test case:</div><div><br></div><div>void foo(float *input, int n) {</div><div><div>  for (int i = -n; i != n; ++i) {                                                     </div><div>    baz(input[i + 5]);                                                            <br></div><div>  }</div></div><div>}</div><div><br></div><div>I expect &input[i + 5] to be promoted to an indvar but it's not. </div><div><br></div><div>The root cause of this misoptimization is that ScalarEvolution is pessimistic about tagging nsw/nuw to a SCEVAddExpr. This pessimization was introduced in <a href="http://llvm.org/viewvc/llvm-project?view=revision&revision=145367" style="font-size:13.1999998092651px" target="_blank">http://llvm.org/viewvc/<u></u>llvm-<u></u>pr<u></u>oject?view=revision&<u></u>rev<u></u>ision=<u></u>145367</a>.<span style="font-size:13.1999998092651px"> </span><span style="font-size:13.1999998092651px">According to the comments there (</span><span style="font-size:13.1999998092651px"><a href="http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04087" target="_blank">http://llvm.org/docs/doxygen/<u></u>html/ScalarEvolution_8cpp_<u></u>source.html#l04087</a>), ScalarEvolution does not apply an instruction's nsw/nuw flags to the corresponding SCEV expression. </span><span style="font-size:13.1999998092651px">I</span><span style="font-size:13.1999998092651px">n the above example, &input[i + 5] corresponds to SCEV expression input + 4 * </span><span style="font-size:13.1999998092651px">sext(i </span><span style="font-size:13.1999998092651px">+ 5). </span><span style="font-size:13.1999998092651px">In order to promote &input[i + 5] to an indvar, we need to at least prove (i </span><span style="font-size:13.1999998092651px">+ 5) does not sign overflow so that we can reassociate the expression to (input + 5) + 4 * sext(i) which can be represented as a SCEVAddRecExpr. </span><span style="font-size:13.1999998092651px">However, because ScalarEvolution doesn't apply sext to (i </span><span style="font-size:13.1999998092651px">+ 5), it cannot distribute sext(i </span><span style="font-size:13.1999998092651px">+ 5) to sext(i) </span><span style="font-size:13.1999998092651px">+ 5, and is thus unable to identify &input[i </span><span style="font-size:13.1999998092651px">+ 5] as a potential indvar. </span></div><div><span style="font-size:13.1999998092651px"><br></span></div><div><span style="font-size:13.1999998092651px">Side note: this issue kicked in after my recent </span><span style="font-size:13.1999998092651px">recent change that disables induction variable widening for the NVPTX64 backend. This issue used to be alleviated (if any) by induction variable widening because there wouldn't be any sext if index i is already 64-bit. </span><br></div><div><span style="font-size:13.1999998092651px"><br></span></div><div><span style="font-size:13.1999998092651px">I wonder if the fix which disables applying nsw/nuw is too conservative. The comments in the source code say</span><span style="font-size:13.1999998092651px"> that ScalarEvolution does not apply an instruction's nsw/nuw flags to the corresponding SCEV expression because another non-control-equivalent instruction without nsw/nuw can be mapped to the same expression. If that's the only case we worried about, is a better fix to be not mapping instructions only differ in nsw/nuw to the same SCEV expression? That can be done by adding the wrapping flag of a SCEVAddExpr expression to the folding set that serves as the index of this expression for SCEV look-up. </span></div><div><span style="font-size:13.1999998092651px"><br></span></div><div>I followed this idea, and tried a preliminary change (<a href="http://reviews.llvm.org/differential/diff/16942/" target="_blank">http://reviews.llvm.org/<u></u>differential/diff/16942/</a>). It works fine so far: no transformation tests failed; some analysis tests failed but the new results seem better instead of incorrect. I wonder if I was just lucky on not breaking tests or it is the right way to go. </div></div></blockquote><br></div></div><div style="word-wrap:break-word"><div><div>SCEV is a system for computing algabreic equivalence. Control dependence has no place there. SCEV not an IR. The basic premise is that equivalent subexpressions will cancel. I don't understand how SCEV can fundamentally work if expressions are tagged with any kind of infomation derived from the IR.</div><div><br></div><div>NSW/NUW flags currently don't change the way SCEV works because they are lazilly and conservativelly attached to the expression without affecting its identity. Even so, they still cause problems because they introduce nondeterminism w.r.t the order SCEVs are computed. They were just an expedient hack to support a corner case in C -- which turns out to be a case people care about. But we can't do much more with them within SCEV. Instead, the user of SCEV will need to reason about NSW/NUW. There has been a lot of discussion about that but I don't think anyone wants to take it on.</div><div><br></div><div>I don't think LSR has a problem with your example. The incoming IR will have a sign-extend in the loop because that's what you asked for. It will treat the sign-extend as a use of the induction variable. The expression looks like this:</div><div><br></div><div>  LSR Use: Kind=Basic, Offsets={0}, widest fixup type: i32</div><div>    reg({(5 + (-1 * %n)),+,1}<nw><%for.body>)</div><div><br></div><div>I take it that you want LSR to convert this from an array index into a pointer increment, like this:</div><div><br></div><div>  LSR Use: Kind=Address of float, Offsets={0}, widest fixup type: float*</div><div>    reg({(20 + (4 * (sext i32 (-1 * %n) to i64)) + %input),+,4}<%for.body>)</div><div><br></div><div>To do that, you'll need to perform sign-extend elimination in indvars! I did think the NVPTX fix to disable IV widening was heavy-handed, but it's not my platform. It would certainly be easy to allow widening of simple cases of array[i+c]. I offered some suggestions in PR21148 I think.</div><div><br></div><div>-Andy</div><div><br></div></div><br></div></blockquote></div></div></div>