<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <br>

    <div class="moz-cite-prefix">On 12/04/2014 01:39 PM, Jingyue Wu

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAMROOrGv=Nw728azvpeCVRJ6RkKpBjAuSK-ObLih0pjivHTm8A@mail.gmail.com"

      type="cite">

      <div>I was chasing down a performance regression on a CUDA

        benchmark compiled for the NVPTX64 backend, and found loop

        strength reduction is ineffective in the presence of sign

        extension. Here's a reduced test case:</div>

      <div><br>

      </div>

      <div>void foo(float *input, int n) {</div>

      <div>

        <div>  for (int i = -n; i != n; ++i) {                          

                                    </div>

        <div>    baz(input[i + 5]);                                    

                                 <br>

        </div>

        <div>  }</div>

      </div>

      <div>}</div>

      <div><br>

      </div>

      <div>I expect &input[i + 5] to be promoted to an indvar but

        it's not. </div>

      <div><br>

      </div>

      <div>The root cause of this misoptimization is that

        ScalarEvolution is pessimistic about tagging nsw/nuw to a

        SCEVAddExpr. This pessimization was introduced in <a

          moz-do-not-send="true"

href="http://llvm.org/viewvc/llvm-project?view=revision&revision=145367"

          target="_blank" style="font-size:13.1999998092651px">http://llvm.org/viewvc/llvm-project?view=revision&revision=145367</a>.<span

          style="font-size:13.1999998092651px"> </span><span

          style="font-size:13.1999998092651px">According to the comments

          there (</span><span style="font-size:13.1999998092651px"><a

            moz-do-not-send="true"

href="http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04087">http://llvm.org/docs/doxygen/html/ScalarEvolution_8cpp_source.html#l04087</a>),

          ScalarEvolution does not apply an instruction's nsw/nuw flags

          to the corresponding SCEV expression. </span><span

          style="font-size:13.1999998092651px">I</span><span

          style="font-size:13.1999998092651px">n the above example,

          &input[i + 5] corresponds to SCEV expression input + 4 * </span><span

          style="font-size:13.1999998092651px">sext(i </span><span

          style="font-size:13.1999998092651px">+ 5). </span><span

          style="font-size:13.1999998092651px">In order to promote

          &input[i + 5] to an indvar, we need to at least prove (i </span><span

          style="font-size:13.1999998092651px">+ 5) does not sign

          overflow so that we can reassociate the expression to (input +

          5) + 4 * sext(i) which can be represented as a SCEVAddRecExpr.

        </span><span style="font-size:13.1999998092651px">However,

          because ScalarEvolution doesn't apply sext to (i </span><span

          style="font-size:13.1999998092651px">+ 5), it cannot

          distribute sext(i </span><span

          style="font-size:13.1999998092651px">+ 5) to sext(i) </span><span

          style="font-size:13.1999998092651px">+ 5, and is thus unable

          to identify &input[i </span><span

          style="font-size:13.1999998092651px">+ 5] as a potential

          indvar. </span></div>

      <div><span style="font-size:13.1999998092651px"><br>

        </span></div>

      <div><span style="font-size:13.1999998092651px">Side note: this

          issue kicked in after my recent </span><span

          style="font-size:13.1999998092651px">recent change that

          disables induction variable widening for the NVPTX64 backend.

          This issue used to be alleviated (if any) by induction

          variable widening because there wouldn't be any sext if index

          i is already 64-bit. </span><br>

      </div>

      <div><span style="font-size:13.1999998092651px"><br>

        </span></div>

      <div><span style="font-size:13.1999998092651px">I wonder if the

          fix which disables applying nsw/nuw is too conservative. The

          comments in the source code say</span><span

          style="font-size:13.1999998092651px"> that ScalarEvolution

          does not apply an instruction's nsw/nuw flags to the

          corresponding SCEV expression because another

          non-control-equivalent instruction without nsw/nuw can be

          mapped to the same expression. If that's the only case we

          worried about, is a better fix to be not mapping instructions

          only differ in nsw/nuw to the same SCEV expression? That can

          be done by adding the wrapping flag of a SCEVAddExpr

          expression to the folding set that serves as the index of this

          expression for SCEV look-up. </span></div>

      <div><span style="font-size:13.1999998092651px"><br>

        </span></div>

      <div>I followed this idea, and tried a preliminary change (<a

          moz-do-not-send="true"

          href="http://reviews.llvm.org/differential/diff/16942/">http://reviews.llvm.org/differential/diff/16942/</a>).

        It works fine so far: no transformation tests failed; some

        analysis tests failed but the new results seem better instead of

        incorrect. I wonder if I was just lucky on not breaking tests or

        it is the right way to go. <br>

      </div>

    </blockquote>

    Speaking as someone who is not an expert in this code, your general

    approach seems workable.  I don't have a good understanding of what

    the implication of reducing the commoning of SCEV would be though. 

    That would be my biggest concern.<br>

    <br>

    I'd suggest you post a patch which gets at least one interesting

    example working.  Concrete patches w/compelling test cases tend to

    get better discussion on llvm-commits.  Alternatively, you might try

    directing your email to llvm-dev.  <br>

    <br>

    p.s. Your actual patch looks suspicious.  Shouldn't you be checking

    the flags on each of the adds visited in the loop above?<br>

    <blockquote

cite="mid:CAMROOrGv=Nw728azvpeCVRJ6RkKpBjAuSK-ObLih0pjivHTm8A@mail.gmail.com"

      type="cite">

      <div><br>

      </div>

      <div>Jingyue</div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

llvm-commits mailing list

<a class="moz-txt-link-abbreviated" href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a>

<a class="moz-txt-link-freetext" href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>