<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<p><font size="-1">Too bad this did not go anywhere. I was looking
forward to a discovery of this mystery.<br>
</font></p>
<div class="moz-cite-prefix"><font size="-1">On 6/12/19 3:58 PM,
Joan Lluch via llvm-dev wrote:</font><br>
</div>
<blockquote type="cite"
cite="mid:3C65A589-85C4-49F8-8A72-6D7CB392CFBC@icloud.com">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div class="">Hi all,</div>
<div class=""><br class="">
</div>
In <span style="font-family: Monaco; font-size: 11px;
background-color: rgb(255, 255, 255); color: rgb(79, 129, 135);"
class="">LSRInstance</span><span style="font-family: Monaco;
font-size: 11px; background-color: rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae()</span> implementation,
the function <span style="color: rgb(49, 89, 93); font-family:
Monaco; font-size: 11px; background-color: rgb(255, 255, 255);"
class="">getMinusSCEV</span> is called at a particular point as
part of the code to replace/normalise 'x == y’ ICmp instructions
into 'x - y == 0’. However, what the actual code does is replacing
them by ‘y - x == 0’ instead.
<div class=""><br class="">
</div>
<div class="">This is the excerpt of code in the <span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">LSRInstance</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae</span> function
I am referring to: </div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; color: rgb(0, 132, 0); background-color:
rgb(255, 255, 255);" class=""> // x == y --> x - y
== 0</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> <span style="color: #ba2da2" class="">const</span>
<span style="color: #4f8187" class="">SCEV</span> *N = <span
style="color: #4f8187" class="">SE</span>.<span
style="color: #31595d" class="">getSCEV</span>(NV);</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; color: rgb(49, 89, 93); background-color:
rgb(255, 255, 255);" class=""><span style="color: #000000"
class=""> </span><span style="color: #ba2da2"
class="">if</span><span style="color: #000000" class=""> (</span><span
style="color: #4f8187" class="">SE</span><span style="color:
#000000" class="">.</span>isLoopInvariant<span style="color:
#000000" class="">(N, </span><span style="color: #4f8187"
class="">L</span><span style="color: #000000" class="">)
&& </span>isSafeToExpand<span style="color:
#000000" class="">(N, </span><span style="color: #4f8187"
class="">SE</span><span style="color: #000000" class="">)) {</span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; color: rgb(0, 132, 0); background-color:
rgb(255, 255, 255);" class=""><span style="color: #000000"
class=""> </span>// S is normalized, so normalize
N before folding it into S</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; color: rgb(0, 132, 0); background-color:
rgb(255, 255, 255);" class=""><span style="color: #000000"
class=""> </span>// to keep the result normalized.</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> N = <span style="color: #31595d" class="">normalizeForPostIncUse</span>(N,
TmpPostIncLoops, <span style="color: #4f8187" class="">SE</span>);</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> Kind = <span style="color: #4f8187"
class="">LSRUse</span>::<span style="color: #31595d"
class="">ICmpZero</span>;</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> S = <span style="color: #4f8187" class="">SE</span>.<span
style="color: #31595d" class="">getMinusSCEV</span>(N, S);
// use instead S = <span style="color: rgb(79, 129, 135);"
class="">SE</span>.<span style="color: rgb(49, 89, 93);"
class="">getMinusSCEV</span>(S, N);</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> }</div>
<p style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);
min-height: 15px;" class=""> <br
class="webkit-block-placeholder">
</p>
<p style="margin: 0px; line-height: normal; background-color:
rgb(255, 255, 255); min-height: 15px;" class="">[ the comment
“use instead" after the getMinusSCEV call is mine ]</p>
<div class=""><br class="">
</div>
</div>
<div class="">This apparently unimportant difference has a
remarkable effect latter on, when the optimal solution for the
loop is computed in <span style="font-family: Monaco; font-size:
11px; background-color: rgb(255, 255, 255); color: rgb(79,
129, 135);" class="">LSRInstance</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255);" class="">::SolveRecurse</span></div>
<div class=""><br class="">
</div>
<div class="">In the function <span style="font-family: Monaco;
font-size: 11px; background-color: rgb(255, 255, 255); color:
rgb(79, 129, 135);" class="">LSRInstance</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255);" class="">::SolveRecurse,</span> given any
two loop variants that are found to have the same cost, the
first one is always chosen. This implies that the initial
formula that was created in <span style="font-family: Monaco;
font-size: 11px; background-color: rgb(255, 255, 255); color:
rgb(79, 129, 135);" class="">LSRInstance</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae()
</span>is the ones that will prevail by default (given same
cost). The point that I want to highlight is that the current
implementation of <span style="font-family: Monaco; font-size:
11px; background-color: rgb(255, 255, 255);" class="">CollectFixupsAndInitialFormulae</span>
reverses the SCEVs found in the user source code rather than
leaving them as found. This causes the following problem:</div>
<div class=""> </div>
<div class="">Consider the following code (please don’t take into
account whether this makes or not something useful, it’s just to
show the point) :</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""><span style="color: #ba2da2" class="">int</span>
asr_Lowert( <span style="color: #ba2da2" class="">int</span>
a, <span style="color: #ba2da2" class="">unsigned</span>
ammount )</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class="">{</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> <span style="color: #ba2da2" class="">if</span> (
(ammount -= <span style="color: #272ad8" class="">8</span>)
>= <span style="color: #272ad8" class="">0</span> )</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> a = a >> <span style="color: #272ad8"
class="">8</span>;</div>
<div style="margin: 0px; line-height: normal; background-color:
rgb(255, 255, 255); min-height: 14px;" class=""><br class="">
</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> <span style="color: #ba2da2" class="">while</span>
( ammount-- )</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> a >>= <span style="color: #272ad8"
class="">1</span>;</div>
<div style="margin: 0px; line-height: normal; background-color:
rgb(255, 255, 255); min-height: 14px;" class=""><br class="">
</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; color: rgb(186, 45, 162);
background-color: rgb(255, 255, 255);" class=""><span
style="color: #000000" class=""> </span>return<span
style="color: #000000" class=""> a; </span></div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class="">}</div>
</div>
<div class=""><br class="">
</div>
<div class="">The interesting part for this discussion is the
‘while’ loop. It gets compiled by Clang like this:</div>
<div class=""><br class="">
</div>
<div class="">
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class="">while.cond: ;
preds = %while.body, %entry</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> %a.addr.1 = phi i32 [ %shr, %entry ], [ %shr1,
%while.body ]</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> %ammount.addr.0 = phi i32 [ %sub, %entry ], [ %dec,
%while.body ]</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> %tobool = icmp eq i32 %ammount.addr.0, 0</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> br i1 %tobool, label %while.end, label %while.body</div>
<div style="margin: 0px; line-height: normal; background-color:
rgb(255, 255, 255); min-height: 14px;" class=""><br class="">
</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class="">while.body: ;
preds = %while.cond</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> %dec = add i32 %ammount.addr.0, -1</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> %shr1 = ashr i32 %a.addr.1, 1</div>
<div style="margin: 0px; font-size: 11px; line-height: normal;
font-family: Monaco; background-color: rgb(255, 255, 255);"
class=""> br label %while.cond</div>
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">Clang mostly follows the source code and produces
code that resembles it. It’s important to note is that variable
‘ ammount’ is decremented by 1 on each loop iteration as found
on the user source code.</div>
<div class=""><br class="">
</div>
<div class="">I am going to chose the x86 target because I really
want this subject to gain some attention, but please note that
this is a general subject that affects ALL targets. So, for the
x86, the resultant assembly code is this:</div>
<div class=""><br class="">
</div>
<div class="">
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.macosx_version_min
10, 12</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.globl<span class="Apple-tab-span" style="white-space:pre"> </span>_asr_Lowert</div>
<div class="">_asr_Lowert:</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_startproc</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>pushl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_offset
8</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_offset
%ebp, -8</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>%esp,
%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_register
%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>8(%ebp),
%eax</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>$8,
%eax</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>pushl<span class="Apple-tab-span" style="white-space:pre"> </span>$8</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>popl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>subl<span class="Apple-tab-span" style="white-space:pre"> </span>12(%ebp),
%ecx</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jmp<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_1</div>
<div class="">LBB0_2:</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>%eax</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>incl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx</div>
<div class="">LBB0_1:</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>testl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx,
%ecx</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jne<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_2</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>popl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>retl</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_endproc</div>
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">What I want to show is that, now, instead of
‘ammount’ being decremented, it is incremented. Actually, the
compiler performs the oddity of setting the initial value of the
‘while’ variable to '8-ammount’ rather than ‘ ammount-8’. The
compiler even needs to perform a ‘push’ and a ‘pop’ as a setup
for that. This is all because the SCEVs where swapped in the
LLVM implementation of <span style="font-family: Monaco;
font-size: 11px; background-color: rgb(255, 255, 255); color:
rgb(79, 129, 135);" class="">LSRInstance</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255);" class="">::CollectFixupsAndInitialFormulae().</span></div>
<div class=""><span style="font-family: Monaco; font-size: 11px;
background-color: rgb(255, 255, 255);" class=""><br class="">
</span></div>
<div class="">Now, let me show what happens if we implement the
function as suggested in the comment, that is, by replacing the
current code by <span style="font-family: Monaco; font-size:
11px; background-color: rgb(255, 255, 255);" class="">S = </span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255); color: rgb(79, 129, 135);" class="">SE</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255);" class="">.</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255); color: rgb(49, 89, 93);" class="">getMinusSCEV</span><span
style="font-family: Monaco; font-size: 11px; background-color:
rgb(255, 255, 255);" class="">(S, N);</span></div>
<div class=""><br class="">
</div>
<div class="">This is what we get now from LLVM for the same
input:</div>
<div class=""><br class="">
</div>
<div class="">
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.macosx_version_min
10, 12</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.globl<span class="Apple-tab-span" style="white-space:pre"> </span>_asr_Lowert</div>
<div class="">_asr_Lowert:</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_startproc</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>pushl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_offset
8</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_offset
%ebp, -8</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>%esp,
%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_def_cfa_register
%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>8(%ebp),
%eax</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>movl<span class="Apple-tab-span" style="white-space:pre"> </span>12(%ebp),
%ecx</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>addl<span class="Apple-tab-span" style="white-space:pre"> </span>$-8,
%ecx</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>$8,
%eax</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jmp<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_1</div>
<div class="">LBB0_2:</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>decl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>sarl<span class="Apple-tab-span" style="white-space:pre"> </span>%eax</div>
<div class="">LBB0_1:</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>testl<span class="Apple-tab-span" style="white-space:pre"> </span>%ecx,
%ecx</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>jne<span class="Apple-tab-span" style="white-space:pre"> </span>LBB0_2</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>popl<span class="Apple-tab-span" style="white-space:pre"> </span>%ebp</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>retl</div>
<div class=""><span class="Apple-tab-span" style="white-space:pre"> </span>.cfi_endproc</div>
</div>
<div class=""><br class="">
</div>
<div class=""><br class="">
</div>
<div class="">The code now uses decrement, and the initial ‘while’
variable value is ‘ ammount-8’ as we could expect from the user
source code. The whole function is also shorter and it does not
use potentially costly push/pop instructions.</div>
<div class=""><br class="">
</div>
<div class="">I also want to point out, that this is not an
isolated case at all. This happens all the time for small loops
where costs of several variants are the same. In all cases,
decrements in the user source code are replaced by positive
increments with necessarily strange setup code (as shown in the
first example above). Positive increments are replaced by
decrements, which also may force extra code generation because
the loop induction variable can’t be any longer used for useful
purposes. The remarkable thing is that by swapping the SCEV
expressions as we currently do, the compiler stops taking the
user source code into account as a hint on what to do, and
instead it may produce worse code in circumstances where the
user may have already chosen a naturally better implementation,
which the compiler just destroys by artificially replacing it by
a different solution with a supposed identical cost.</div>
<div class=""><br class="">
</div>
<div class="">Also, although this notably affects small loops with
a few small-cost variants, it also affects more complex loops
that may have two or more optimal solutions. Among the optimal
solutions of the same cost, the algorithm will always chose the
first one in the list, so again the one that will be chosen may
be a reversed one with respect to the user code, unless it is a
really different solution. </div>
<div class=""><br class="">
</div>
<div class="">I do not understand the reason behind such reversing
of loop expressions, and that’s the question that I post. I
understand that we may want to always favour decrements (maybe
for performance reasons in some targets), or that we may want to
prioritise increments (maybe to keep induction variables usable
in addressing modes in some targets), or even that we may want
to do different things for different targets. But I do not quite
understand the current implementation. It almost seems to me a
typo in the LLVM source code that nobody has come across before.
I have had my local copy of LLVM for some time with the code set
as in the comment, and I have yet to find a case where it
produces worse code than the original.</div>
<div class=""><br class="">
</div>
<div class="">So, I really am interested in starting some informed
discussion about this.</div>
<div class=""><br class="">
</div>
<div class="">John Lluch</div>
<div class=""><br class="">
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
LLVM Developers mailing list
<a class="moz-txt-link-abbreviated" href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>
<a class="moz-txt-link-freetext" href="https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev">https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a>
</pre>
</blockquote>
</body>
</html>