<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p><font size="-1">Too bad this did not go anywhere. I was looking
        forward to a discovery of this mystery.<br>
      </font></p>
    <div class="moz-cite-prefix"><font size="-1">On 6/12/19 3:58 PM,
        Joan Lluch via llvm-dev wrote:</font><br>
    </div>
    <blockquote type="cite"
      cite="mid:3C65A589-85C4-49F8-8A72-6D7CB392CFBC@icloud.com">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <div class="">Hi all,</div>
      <div class=""><br class="">
      </div>
      In <span style="font-family: Monaco; font-size: 11px;
        background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);"
        class="">LSRInstance</span><span style="font-family: Monaco;
        font-size: 11px; background-color: rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae()</span> implementation,
      the function <span style="color: rgb(49, 89, 93); font-family:
        Monaco; font-size: 11px; background-color: rgb(255, 255, 255);"
        class="">getMinusSCEV</span> is called at a particular point as
      part of the code to replace/normalise 'x == y’  ICmp instructions
      into 'x - y == 0’. However, what the actual code does is replacing
      them by ‘y - x == 0’ instead.
      <div class=""><br class="">
      </div>
      <div class="">This is the excerpt of code in the <span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae</span> function
        I am referring to:  </div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; color: rgb(0, 132, 0); background-color:
          rgb(255, 255, 255);" class="">        // x == y  -->  x - y
          == 0</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">        <span style="color: #ba2da2" class="">const</span>
          <span style="color: #4f8187" class="">SCEV</span> *N = <span
            style="color: #4f8187" class="">SE</span>.<span
            style="color: #31595d" class="">getSCEV</span>(NV);</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; color: rgb(49, 89, 93); background-color:
          rgb(255, 255, 255);" class=""><span style="color: #000000"
            class="">        </span><span style="color: #ba2da2"
            class="">if</span><span style="color: #000000" class=""> (</span><span
            style="color: #4f8187" class="">SE</span><span style="color:
            #000000" class="">.</span>isLoopInvariant<span style="color:
            #000000" class="">(N, </span><span style="color: #4f8187"
            class="">L</span><span style="color: #000000" class="">)
            && </span>isSafeToExpand<span style="color:
            #000000" class="">(N, </span><span style="color: #4f8187"
            class="">SE</span><span style="color: #000000" class="">)) {</span></div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; color: rgb(0, 132, 0); background-color:
          rgb(255, 255, 255);" class=""><span style="color: #000000"
            class="">          </span>// S is normalized, so normalize
          N before folding it into S</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; color: rgb(0, 132, 0); background-color:
          rgb(255, 255, 255);" class=""><span style="color: #000000"
            class="">          </span>// to keep the result normalized.</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">          N = <span style="color: #31595d" class="">normalizeForPostIncUse</span>(N,
          TmpPostIncLoops, <span style="color: #4f8187" class="">SE</span>);</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">          Kind = <span style="color: #4f8187"
            class="">LSRUse</span>::<span style="color: #31595d"
            class="">ICmpZero</span>;</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">          S = <span style="color: #4f8187" class="">SE</span>.<span
            style="color: #31595d" class="">getMinusSCEV</span>(N, S);  
             // use instead S = <span style="color: rgb(79, 129, 135);"
            class="">SE</span>.<span style="color: rgb(49, 89, 93);"
            class="">getMinusSCEV</span>(S, N);</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">        }</div>
        <p style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);
          min-height: 15px;" class="">        <br
            class="webkit-block-placeholder">
        </p>
        <p style="margin: 0px; line-height: normal; background-color:
          rgb(255, 255, 255); min-height: 15px;" class="">[ the comment
          “use instead" after the getMinusSCEV call is mine ]</p>
        <div class=""><br class="">
        </div>
      </div>
      <div class="">This apparently unimportant difference has a
        remarkable effect latter on, when the optimal solution for the
        loop is computed in <span style="font-family: Monaco; font-size:
          11px; background-color: rgb(255, 255, 255); color: rgb(79,
          129, 135);" class="">LSRInstance</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255);" class="">::SolveRecurse</span></div>
      <div class=""><br class="">
      </div>
      <div class="">In the function  <span style="font-family: Monaco;
          font-size: 11px; background-color: rgb(255, 255, 255); color:
          rgb(79, 129, 135);" class="">LSRInstance</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255);" class="">::SolveRecurse,</span> given any
        two loop variants that are found to have the same cost, the
        first one is always chosen. This implies that the initial
        formula that was created in <span style="font-family: Monaco;
          font-size: 11px; background-color: rgb(255, 255, 255); color:
          rgb(79, 129, 135);" class="">LSRInstance</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae()
        </span>is the ones that will prevail by default (given same
        cost). The point that I want to highlight is that the current
        implementation of <span style="font-family: Monaco; font-size:
          11px; background-color: rgb(255, 255, 255);" class="">CollectFixupsAndInitialFormulae</span> 
        reverses the SCEVs found in the user source code rather than
        leaving them as found. This causes the following problem:</div>
      <div class=""> </div>
      <div class="">Consider the following code (please don’t take into
        account whether this makes or not something useful, it’s just to
        show the point) :</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class=""><span style="color: #ba2da2" class="">int</span>
          asr_Lowert( <span style="color: #ba2da2" class="">int</span>
          a, <span style="color: #ba2da2" class="">unsigned</span>
          ammount )</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">{</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  <span style="color: #ba2da2" class="">if</span> (
          (ammount -= <span style="color: #272ad8" class="">8</span>)
          >= <span style="color: #272ad8" class="">0</span>  )</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">    a = a >> <span style="color: #272ad8"
            class="">8</span>;</div>
        <div style="margin: 0px; line-height: normal; background-color:
          rgb(255, 255, 255); min-height: 14px;" class=""><br class="">
        </div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  <span style="color: #ba2da2" class="">while</span>
          ( ammount-- )</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">    a >>= <span style="color: #272ad8"
            class="">1</span>;</div>
        <div style="margin: 0px; line-height: normal; background-color:
          rgb(255, 255, 255); min-height: 14px;" class=""><br class="">
        </div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; color: rgb(186, 45, 162);
          background-color: rgb(255, 255, 255);" class=""><span
            style="color: #000000" class="">  </span>return<span
            style="color: #000000" class=""> a; </span></div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">}</div>
      </div>
      <div class=""><br class="">
      </div>
      <div class="">The interesting part for this discussion is the
        ‘while’ loop. It gets compiled by Clang like this:</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">while.cond:                                       ;
          preds = %while.body, %entry</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  %a.addr.1 = phi i32 [ %shr, %entry ], [ %shr1,
          %while.body ]</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  %ammount.addr.0 = phi i32 [ %sub, %entry ], [ %dec,
          %while.body ]</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  %tobool = icmp eq i32 %ammount.addr.0, 0</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  br i1 %tobool, label %while.end, label %while.body</div>
        <div style="margin: 0px; line-height: normal; background-color:
          rgb(255, 255, 255); min-height: 14px;" class=""><br class="">
        </div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">while.body:                                       ;
          preds = %while.cond</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  %dec = add i32 %ammount.addr.0, -1</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  %shr1 = ashr i32 %a.addr.1, 1</div>
        <div style="margin: 0px; font-size: 11px; line-height: normal;
          font-family: Monaco; background-color: rgb(255, 255, 255);"
          class="">  br label %while.cond</div>
      </div>
      <div class=""><br class="">
      </div>
      <div class=""><br class="">
      </div>
      <div class="">Clang mostly follows the source code and produces
        code that resembles it. It’s important to note is that variable
        ‘ ammount’ is decremented by 1 on each loop iteration as found
        on the user source code.</div>
      <div class=""><br class="">
      </div>
      <div class="">I am going to chose the x86 target because I really
        want this subject to gain some attention, but please note that
        this is a general subject that affects ALL targets. So, for the
        x86, the resultant assembly code is this:</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.macosx_version_min
          10, 12</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.globl<span class="Apple-tab-span" style="white-space:pre">      </span>_asr_Lowert</div>
        <div class="">_asr_Lowert:</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_startproc</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>pushl<span class="Apple-tab-span" style="white-space:pre">       </span>%ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_def_cfa_offset
          8</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_offset
          %ebp, -8</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>movl<span class="Apple-tab-span" style="white-space:pre">        </span>%esp,
          %ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_def_cfa_register
          %ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>movl<span class="Apple-tab-span" style="white-space:pre">        </span>8(%ebp),
          %eax</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>sarl<span class="Apple-tab-span" style="white-space:pre">        </span>$8,
          %eax</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>pushl<span class="Apple-tab-span" style="white-space:pre">       </span>$8</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>popl<span class="Apple-tab-span" style="white-space:pre">        </span>%ecx</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>subl<span class="Apple-tab-span" style="white-space:pre">        </span>12(%ebp),
          %ecx</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>jmp<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_1</div>
        <div class="">LBB0_2:</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>sarl<span class="Apple-tab-span" style="white-space:pre">        </span>%eax</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>incl<span class="Apple-tab-span" style="white-space:pre">        </span>%ecx</div>
        <div class="">LBB0_1:</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>testl<span class="Apple-tab-span" style="white-space:pre">       </span>%ecx,
          %ecx</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>jne<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_2</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>popl<span class="Apple-tab-span" style="white-space:pre">        </span>%ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>retl</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_endproc</div>
      </div>
      <div class=""><br class="">
      </div>
      <div class=""><br class="">
      </div>
      <div class="">What I want to show is that, now, instead of
        ‘ammount’ being decremented, it is incremented. Actually, the
        compiler performs the oddity of setting the initial value of the
        ‘while’ variable to '8-ammount’ rather than ‘ ammount-8’. The
        compiler even needs to perform a ‘push’ and a ‘pop’ as a setup
        for that. This is all because the SCEVs where swapped in the
        LLVM implementation of  <span style="font-family: Monaco;
          font-size: 11px; background-color: rgb(255, 255, 255); color:
          rgb(79, 129, 135);" class="">LSRInstance</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae().</span></div>
      <div class=""><span style="font-family: Monaco; font-size: 11px;
          background-color: rgb(255, 255, 255);" class=""><br class="">
        </span></div>
      <div class="">Now, let me show what happens if we implement the
        function as suggested in the comment, that is, by replacing the
        current code by <span style="font-family: Monaco; font-size:
          11px; background-color: rgb(255, 255, 255);" class="">S = </span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">SE</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255);" class="">.</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255); color: rgb(49, 89, 93);" class="">getMinusSCEV</span><span
          style="font-family: Monaco; font-size: 11px; background-color:
          rgb(255, 255, 255);" class="">(S, N);</span></div>
      <div class=""><br class="">
      </div>
      <div class="">This is what we get now from LLVM for the same
        input:</div>
      <div class=""><br class="">
      </div>
      <div class="">
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.macosx_version_min
          10, 12</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.globl<span class="Apple-tab-span" style="white-space:pre">      </span>_asr_Lowert</div>
        <div class="">_asr_Lowert:</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_startproc</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>pushl<span class="Apple-tab-span" style="white-space:pre">       </span>%ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_def_cfa_offset
          8</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_offset
          %ebp, -8</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>movl<span class="Apple-tab-span" style="white-space:pre">        </span>%esp,
          %ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_def_cfa_register
          %ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>movl<span class="Apple-tab-span" style="white-space:pre">        </span>8(%ebp),
          %eax</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>movl<span class="Apple-tab-span" style="white-space:pre">        </span>12(%ebp),
          %ecx</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>addl<span class="Apple-tab-span" style="white-space:pre">        </span>$-8,
          %ecx</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>sarl<span class="Apple-tab-span" style="white-space:pre">        </span>$8,
          %eax</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>jmp<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_1</div>
        <div class="">LBB0_2:</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>decl<span class="Apple-tab-span" style="white-space:pre">        </span>%ecx</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>sarl<span class="Apple-tab-span" style="white-space:pre">        </span>%eax</div>
        <div class="">LBB0_1:</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>testl<span class="Apple-tab-span" style="white-space:pre">       </span>%ecx,
          %ecx</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>jne<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_2</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>popl<span class="Apple-tab-span" style="white-space:pre">        </span>%ebp</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>retl</div>
        <div class=""><span class="Apple-tab-span" style="white-space:pre">   </span>.cfi_endproc</div>
      </div>
      <div class=""><br class="">
      </div>
      <div class=""><br class="">
      </div>
      <div class="">The code now uses decrement, and the initial ‘while’
        variable value is ‘ ammount-8’ as we could expect from the user
        source code. The whole function is also shorter and it does not
        use potentially costly push/pop instructions.</div>
      <div class=""><br class="">
      </div>
      <div class="">I also want to point out, that this is not an
        isolated case at all. This happens all the time for small loops
        where costs of several variants are the same. In all cases,
        decrements in the user source code are replaced by positive
        increments with necessarily strange setup code (as shown in the
        first example above). Positive increments are replaced by
        decrements, which also may force extra code generation because
        the loop induction variable can’t be any longer used for useful
        purposes. The remarkable thing is that by swapping the SCEV
        expressions as we currently do, the compiler stops taking the
        user source code into account as a hint on what to do, and
        instead it may produce worse code in circumstances where the
        user may have already chosen a naturally better implementation,
        which the compiler just destroys by artificially replacing it by
        a different solution with a supposed identical cost.</div>
      <div class=""><br class="">
      </div>
      <div class="">Also, although this notably affects small loops with
        a few small-cost variants, it also affects more complex loops
        that may have two or more optimal solutions. Among the optimal
        solutions of the same cost, the algorithm will always chose the
        first one in the list, so again the one that will be chosen may
        be a reversed one with respect to the user code, unless it is a
        really different solution. </div>
      <div class=""><br class="">
      </div>
      <div class="">I do not understand the reason behind such reversing
        of loop expressions, and that’s the question that I post. I
        understand that we may want to always favour decrements (maybe
        for performance reasons in some targets), or that we may want to
        prioritise increments (maybe to keep induction variables usable
        in addressing modes in some targets), or even that we may want
        to do different things for different targets. But I do not quite
        understand the current implementation. It almost seems to me a
        typo in the LLVM source code that nobody has come across before.
        I have had my local copy of LLVM for some time with the code set
        as in the comment, and I have yet to find a case where it
        produces worse code than the original.</div>
      <div class=""><br class="">
      </div>
      <div class="">So, I really am interested in starting some informed
        discussion about this.</div>
      <div class=""><br class="">
      </div>
      <div class="">John Lluch</div>
      <div class=""><br class="">
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
    </blockquote>
  </body>
</html>