<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class="">Hi all,</div><div class=""><br class=""></div>In <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae()</span> implementation, the function <span style="color: rgb(49, 89, 93); font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">getMinusSCEV</span> is called at a particular point as part of the code to replace/normalise 'x == y’ ICmp instructions into 'x - y == 0’. However, what the actual code does is replacing them by ‘y - x == 0’ instead.<div class=""><br class=""></div><div class="">This is the excerpt of code in the <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae</span> function I am referring to: </div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; color: rgb(0, 132, 0); background-color: rgb(255, 255, 255);" class=""> // x == y --> x - y == 0</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> <span style="color: #ba2da2" class="">const</span> <span style="color: #4f8187" class="">SCEV</span> *N = <span style="color: #4f8187" class="">SE</span>.<span style="color: #31595d" class="">getSCEV</span>(NV);</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; color: rgb(49, 89, 93); background-color: rgb(255, 255, 255);" class=""><span style="color: #000000" class=""> </span><span style="color: #ba2da2" class="">if</span><span style="color: #000000" class=""> (</span><span style="color: #4f8187" class="">SE</span><span style="color: #000000" class="">.</span>isLoopInvariant<span style="color: #000000" class="">(N, </span><span style="color: #4f8187" class="">L</span><span style="color: #000000" class="">) && </span>isSafeToExpand<span style="color: #000000" class="">(N, </span><span style="color: #4f8187" class="">SE</span><span style="color: #000000" class="">)) {</span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; color: rgb(0, 132, 0); background-color: rgb(255, 255, 255);" class=""><span style="color: #000000" class=""> </span>// S is normalized, so normalize N before folding it into S</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; color: rgb(0, 132, 0); background-color: rgb(255, 255, 255);" class=""><span style="color: #000000" class=""> </span>// to keep the result normalized.</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> N = <span style="color: #31595d" class="">normalizeForPostIncUse</span>(N, TmpPostIncLoops, <span style="color: #4f8187" class="">SE</span>);</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> Kind = <span style="color: #4f8187" class="">LSRUse</span>::<span style="color: #31595d" class="">ICmpZero</span>;</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> S = <span style="color: #4f8187" class="">SE</span>.<span style="color: #31595d" class="">getMinusSCEV</span>(N, S); // use instead S = <span style="color: rgb(79, 129, 135);" class="">SE</span>.<span style="color: rgb(49, 89, 93);" class="">getMinusSCEV</span>(S, N);</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> }</div><p style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255); min-height: 15px;" class=""> <br class="webkit-block-placeholder"></p><p style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 15px;" class="">[ the comment “use instead" after the getMinusSCEV call is mine ]</p><div class=""><br class=""></div></div><div class="">This apparently unimportant difference has a remarkable effect latter on, when the optimal solution for the loop is computed in <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">::SolveRecurse</span></div><div class=""><br class=""></div><div class="">In the function <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">::SolveRecurse,</span> given any two loop variants that are found to have the same cost, the first one is always chosen. This implies that the initial formula that was created in <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae() </span>is the ones that will prevail by default (given same cost). The point that I want to highlight is that the current implementation of <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">CollectFixupsAndInitialFormulae</span> reverses the SCEVs found in the user source code rather than leaving them as found. This causes the following problem:</div><div class=""> </div><div class="">Consider the following code (please don’t take into account whether this makes or not something useful, it’s just to show the point) :</div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""><span style="color: #ba2da2" class="">int</span> asr_Lowert( <span style="color: #ba2da2" class="">int</span> a, <span style="color: #ba2da2" class="">unsigned</span> ammount )</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class="">{</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> <span style="color: #ba2da2" class="">if</span> ( (ammount -= <span style="color: #272ad8" class="">8</span>) >= <span style="color: #272ad8" class="">0</span> )</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> a = a >> <span style="color: #272ad8" class="">8</span>;</div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> <span style="color: #ba2da2" class="">while</span> ( ammount-- )</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> a >>= <span style="color: #272ad8" class="">1</span>;</div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; color: rgb(186, 45, 162); background-color: rgb(255, 255, 255);" class=""><span style="color: #000000" class=""> </span>return<span style="color: #000000" class=""> a; </span></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class="">}</div></div><div class=""><br class=""></div><div class="">The interesting part for this discussion is the ‘while’ loop. It gets compiled by Clang like this:</div><div class=""><br class=""></div><div class=""><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class="">while.cond: ; preds = %while.body, %entry</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> %a.addr.1 = phi i32 [ %shr, %entry ], [ %shr1, %while.body ]</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> %ammount.addr.0 = phi i32 [ %sub, %entry ], [ %dec, %while.body ]</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> %tobool = icmp eq i32 %ammount.addr.0, 0</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> br i1 %tobool, label %while.end, label %while.body</div><div style="margin: 0px; line-height: normal; background-color: rgb(255, 255, 255); min-height: 14px;" class=""><br class=""></div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class="">while.body: ; preds = %while.cond</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> %dec = add i32 %ammount.addr.0, -1</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> %shr1 = ashr i32 %a.addr.1, 1</div><div style="margin: 0px; font-size: 11px; line-height: normal; font-family: Monaco; background-color: rgb(255, 255, 255);" class=""> br label %while.cond</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">Clang mostly follows the source code and produces code that resembles it. It’s important to note is that variable ‘ ammount’ is decremented by 1 on each loop iteration as found on the user source code.</div><div class=""><br class=""></div><div class="">I am going to chose the x86 target because I really want this subject to gain some attention, but please note that this is a general subject that affects ALL targets. So, for the x86, the resultant assembly code is this:</div><div class=""><br class=""></div><div class=""><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.macosx_version_min 10, 12</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.globl<span class="Apple-tab-span" style="white-space:pre"> </span>_asr_Lowert</div><div class="">_asr_Lowert:</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_startproc</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>pushl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_offset 8</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_offset %ebp, -8</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>%esp, %ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_register %ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>8(%ebp), %eax</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>$8, %eax</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>pushl<span class="Apple-tab-span" style="white-space:pre"> </span>$8</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>popl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>subl<span class="Apple-tab-span" style="white-space:pre"> </span>12(%ebp), %ecx</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jmp<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_1</div><div class="">LBB0_2:</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>%eax</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>incl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx</div><div class="">LBB0_1:</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>testl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx, %ecx</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jne<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_2</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>popl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>retl</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_endproc</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">What I want to show is that, now, instead of ‘ammount’ being decremented, it is incremented. Actually, the compiler performs the oddity of setting the initial value of the ‘while’ variable to '8-ammount’ rather than ‘ ammount-8’. The compiler even needs to perform a ‘push’ and a ‘pop’ as a setup for that. This is all because the SCEVs where swapped in the LLVM implementation of <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae().</span></div><div class=""><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class=""><br class=""></span></div><div class="">Now, let me show what happens if we implement the function as suggested in the comment, that is, by replacing the current code by <span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">S = </span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">SE</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">.</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255); color: rgb(49, 89, 93);" class="">getMinusSCEV</span><span style="font-family: Monaco; font-size: 11px; background-color: rgb(255, 255, 255);" class="">(S, N);</span></div><div class=""><br class=""></div><div class="">This is what we get now from LLVM for the same input:</div><div class=""><br class=""></div><div class=""><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.macosx_version_min 10, 12</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.globl<span class="Apple-tab-span" style="white-space:pre"> </span>_asr_Lowert</div><div class="">_asr_Lowert:</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_startproc</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>pushl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_offset 8</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_offset %ebp, -8</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>%esp, %ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_register %ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>8(%ebp), %eax</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>12(%ebp), %ecx</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>addl<span class="Apple-tab-span" style="white-space:pre"> </span>$-8, %ecx</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>$8, %eax</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jmp<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_1</div><div class="">LBB0_2:</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>decl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>%eax</div><div class="">LBB0_1:</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>testl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx, %ecx</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jne<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_2</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>popl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>retl</div><div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_endproc</div></div><div class=""><br class=""></div><div class=""><br class=""></div><div class="">The code now uses decrement, and the initial ‘while’ variable value is ‘ ammount-8’ as we could expect from the user source code. The whole function is also shorter and it does not use potentially costly push/pop instructions.</div><div class=""><br class=""></div><div class="">I also want to point out, that this is not an isolated case at all. This happens all the time for small loops where costs of several variants are the same. In all cases, decrements in the user source code are replaced by positive increments with necessarily strange setup code (as shown in the first example above). Positive increments are replaced by decrements, which also may force extra code generation because the loop induction variable can’t be any longer used for useful purposes. The remarkable thing is that by swapping the SCEV expressions as we currently do, the compiler stops taking the user source code into account as a hint on what to do, and instead it may produce worse code in circumstances where the user may have already chosen a naturally better implementation, which the compiler just destroys by artificially replacing it by a different solution with a supposed identical cost.</div><div class=""><br class=""></div><div class="">Also, although this notably affects small loops with a few small-cost variants, it also affects more complex loops that may have two or more optimal solutions. Among the optimal solutions of the same cost, the algorithm will always chose the first one in the list, so again the one that will be chosen may be a reversed one with respect to the user code, unless it is a really different solution. </div><div class=""><br class=""></div><div class="">I do not understand the reason behind such reversing of loop expressions, and that’s the question that I post. I understand that we may want to always favour decrements (maybe for performance reasons in some targets), or that we may want to prioritise increments (maybe to keep induction variables usable in addressing modes in some targets), or even that we may want to do different things for different targets. But I do not quite understand the current implementation. It almost seems to me a typo in the LLVM source code that nobody has come across before. I have had my local copy of LLVM for some time with the code set as in the comment, and I have yet to find a case where it produces worse code than the original.</div><div class=""><br class=""></div><div class="">So, I really am interested in starting some informed discussion about this.</div><div class=""><br class=""></div><div class="">John Lluch</div><div class=""><br class=""></div></body></html>