<div dir="ltr">Hi Andy, thanks for your help!!<div><div>The scheduled code by method A is same as B when using the new machine model. </div><div>it's make sense, but there is the another problem, the scheduled code is badly.</div>
<div><br></div><div>load/store instruction always reuse the same register<br></div><div><br></div><div>Source:<br></div><div><br></div><div>#define N 2000000<br></div><div>static double b[N], c[N];<br></div><div><div>void Scale () {</div>
<div> double scalar = 3.0;</div><div> for (int j=0;j<N;j++)</div><div> b[j] = scalar*c[j];</div><div>}</div></div><div><br></div><div>$clang -O3 foo.c -static -S -o foo.s -mllvm -unroll-count=4 -mcpu=cortex-a9 -fno-vectorize -fno-slp-vectorize --target=arm -mfloat-abi=hard -mllvm -enable-misched -mllvm -scheditins=false</div>
<div><br></div><div>per-operand cost model :<br></div><div><div>Scale:</div><div> push {lr}</div><div> movw r12, :lower16:c</div><div> movw lr, :lower16:b</div><div> movw r3, #9216</div><div> movt r12, :upper16:c</div>
<div> mov r1, #0</div><div> vmov.f64 d16, #3.000000e+00</div><div> movt lr, :upper16:b</div><div> movt r3, #244</div><div>.LBB0_1:</div><div> add r0, r12, r1</div><div> add r2, lr, r1</div><div><font color="#000000"> </font><b><font color="#000000">vldr </font><font color="#ff0000">d17</font><font color="#000000">, [r0]</font></b></div>
<div> add r1, r1, #32</div><div> vmul.f64 d17, d17, d16</div><div> cmp r1, r3</div><div><font color="#000000"> vstr d17, [r2]</font></div><div><b> vldr <font color="#ff0000">d17,</font> [r0, #8]</b></div><div> vmul.f64 d17, d17, d16</div>
<div><b> </b> vstr d17, [r2, #8]</div><div><b> vldr <font color="#ff0000">d17</font>, [r0, #16]</b></div><div> vmul.f64 d17, d17, d16</div><div> vstr d17, [r2, #16]</div><div><b> vldr <font color="#ff0000">d17</font>, [r0, #24]</b></div>
<div> vmul.f64 d17, d17, d16</div><div> vstr d17, [r2, #24]</div><div> bne .LBB0_1</div><div> pop {lr}</div><div> bx lr</div><div>.Ltmp0:</div></div><div><br></div><div>Using <span style="color:rgb(0,0,0);white-space:pre-wrap">Itinerary will generate better scheduled code:</span></div>
<div><font color="#000000"><span style="white-space:pre-wrap">clang -O3 foo.c -static -S -o foo.s -mllvm -unroll-count=4 -mcpu=cortex-a9 -fno-vectorize -fno-slp-vectorize --target=arm -mfloat-abi=hard -mllvm -enable-misched</span></font><br>
</div><div><font color="#000000"><span style="white-space:pre-wrap"><br></span></font></div><div><span style="white-space:pre-wrap"><font color="#000000">Scale:
movw r12, :lower16:c
movw r2, :lower16:b
movw r3, #9216
movt r12, :upper16:c
mov r1, #0
vmov.f64 d16, #3.000000e+00
movt r2, :upper16:b
movt r3, #244
.LBB0_1:
add r0, r12, r1
</font><b><font color="#000000"> vldr </font><font color="#ff0000">d17</font><font color="#000000">, [r0]</font></b><font color="#000000">
</font><b style="color:rgb(0,0,0)"> vldr </b><b><font color="#ff0000">d18</font></b><b style="color:rgb(0,0,0)">, [r0, #8]</b><font color="#000000">
vmul.f64 d17, d17, d16
</font><b style="color:rgb(0,0,0)"> vldr </b><b><font color="#ff0000">d19</font></b><b style="color:rgb(0,0,0)">, [r0, #16]</b><font color="#000000">
</font><b style="color:rgb(0,0,0)"> vldr </b><b><font color="#ff0000">d20</font></b><b style="color:rgb(0,0,0)">, [r0, #24]</b><font color="#000000">
add r0, r2, r1
vmul.f64 d18, d18, d16
add r1, r1, #32
cmp r1, r3
vmul.f64 d19, d19, d16
vmul.f64 d20, d20, d16
vstmia r0, {d17, d18, d19, d20}
bne .LBB0_1
bx lr</font></span><br></div><div><span style="color:rgb(0,0,0);white-space:pre-wrap"><br></span></div><div>this is just because A9's <span style="color:rgb(0,0,0);white-space:pre-wrap">per-operand machine model is not </span><span style="color:rgb(0,0,0);white-space:pre-wrap">implemented </span>well? </div>
<div>By the way, why do you want to use the new machine model for mi-sched? <br></div><div><br></div><div>Thanks,</div><div><br></div><div>Kind regards<br></div><div>Kuan-Hsu</div><div><br></div></div></div><div class="gmail_extra">
<br><br><div class="gmail_quote">2013/10/15 Andrew Trick <span dir="ltr"><<a href="mailto:atrick@apple.com" target="_blank">atrick@apple.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><br><div><div class="im"><div>On Oct 14, 2013, at 3:27 AM, Zakk <<a href="mailto:zakk0610@gmail.com" target="_blank">zakk0610@gmail.com</a>> wrote:</div><br><blockquote type="cite">
<div dir="ltr">Hi all, <div>I meet this problem when <span style="font-family:arial,sans-serif;font-size:13.888888359069824px">compiling the</span> TREAM benchmark (<a href="http://www.cs.virginia.edu/stream/FTP/Code/" target="_blank">http://www.cs.virginia.edu/stream/FTP/Code/</a>) with enable-misched</div>
<div><br></div><div>The small function will be scheduled as good code, but if opt inline this function, the inline part will be scheduled as bad code.</div></div></blockquote><div><br></div></div><div>A bug for this is welcome. Pretty soon, I’ll be verifying A9 performance and changing the default scheduler. When I do this, I’ll be using the new machine model:</div>
<div><br></div><div>(-mllvm) -sched-itins=false</div><div><br></div><div>However, some scheduler changes are required for that mode to fully enforce pipeline hazards.</div><div class="im"><br><blockquote type="cite"><div dir="ltr">
<div>so I rewrite a simple code as attached link (foo.c), and compiled with two different methods:<br>
</div><div><br></div><div><b><font face="arial, helvetica, sans-serif">method A:</font></b></div><div><b><font face="arial, helvetica, sans-serif">$clang -O3 foo.c -static -S -o foo.s -mllvm -enable-misched -mllvm -unroll-count=4 <font>--target=arm </font><font>-mfloat-abi=hard </font>-mcpu=cortex-a9 -fno-vectorize -fno-slp-vectorize</font></b></div>
<div><b><font face="arial, helvetica, sans-serif"><br></font></b></div><div><b><font face="arial, helvetica, sans-serif">and</font></b></div><div><b><font face="arial, helvetica, sans-serif"><br></font></b></div><div><b><font face="arial, helvetica, sans-serif">method B:</font></b></div>
<div><b><font face="arial, helvetica, sans-serif"><font>$clang foo.c -S -emit-llvm -o foo.bc --target=arm -mfloat-abi=hard -mcpu=cortex-a9</font><br></font></b></div><div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;direction:ltr;word-break:normal">
<font face="arial, helvetica, sans-serif"><b>$opt foo.bc -O3 -unroll-count=4 -o foo.opt.bc</b></font></div><b><font face="arial, helvetica, sans-serif">
</font></b><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;direction:ltr;word-break:normal"><font face="arial, helvetica, sans-serif"><b>$llc foo.opt.bc -o foo.opt.s -march=arm -mcpu=cortex-a9 -enable-misched</b></font><br>
</div></div></div></blockquote><div><br></div></div><div>You can try “clang -O3 -mllvm -disable-llvm-optzns …”. clang should generate the same bitcode, but skip the “opt” step.</div><div><br></div><div>If that doesn’t work it can be a nightmare trying to decompose the compilations steps with fidelity. You can try:</div>
<div>- clang -### … </div><div>- clang -mllvm -print-options …</div><div>- Passing a full triple to all tools with -mtriple</div><div>- Debug the TargetOptions fields</div><div>- -print-after-all to see which phase is different</div>
<div><br></div><div>Even if you get all the options right, the process of serializing and rereading the IR can affect the optimizations.</div><div><br></div><div>Sorry. I’ve been trying to think of a way to improve this situation.</div>
<div><br></div><div>-Andy</div><div><br></div><blockquote type="cite"><div class="im"><div dir="ltr"><div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;direction:ltr;word-break:normal">
</div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal">
<span style="font-size:12pt;font-family:Calibri">(ps. I had checked with debug-pass=structure, so I think they are </span><font face="Calibri"><span style="font-size:15.972221374511719px">equivalently)</span></font></div>
<div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal"><br></div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal"><span style="font-family:Calibri;font-size:12pt">but the result is different: </span><br>
</div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal"><span style="font-size:12pt;font-family:Calibri">You can find the </span><font face="Calibri"><span style="font-size:15.972221374511719px">LBB1_4 of foo.s, it always reuses the same reg for computation, but LBB1_4 of foo.opt.s doesn't.</span></font></div>
<div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal"><font face="Calibri"><span style="font-size:15.972221374511719px"><br></span></font></div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal">
<font face="Calibri"><span style="font-size:15.972221374511719px">My question is how to just use clang (method A) to achieve B result? <br>Or i am missing something here?</span></font></div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal">
<font face="Calibri"><span style="font-size:15.972221374511719px"><br></span></font></div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal"><span style="font-family:arial,sans-serif;font-size:13.888888359069824px">I really appreciate </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13.888888359069824px">any</span><span style="font-family:arial,sans-serif;font-size:13.888888359069824px"> </span><span style="background-color:rgb(255,255,204);font-family:arial,sans-serif;font-size:13.888888359069824px">help and suggestions</span><span style="font-family:arial,sans-serif;font-size:13.888888359069824px">.</span><font face="Calibri"><span style="font-size:15.972221374511719px"><br>
</span></font></div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal"><span style="font-family:arial,sans-serif;font-size:13.888888359069824px">Thanks</span><br></div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal">
<span style="font-family:arial,sans-serif;font-size:13.888888359069824px"><br></span></div>Kuan-Hsu<br><br>------- file link -------
</div><div><div style="margin-top:0pt;margin-bottom:0pt;margin-left:0in;word-break:normal">foo.c: <a href="http://goo.gl/nVa2K0" target="_blank">http://goo.gl/nVa2K0</a><br></div><div>foo.s: <a href="http://goo.gl/ML9eNj" target="_blank">http://goo.gl/ML9eNj</a></div>
<div>foo.opt.s: <a href="http://goo.gl/31PCnf" target="_blank">http://goo.gl/31PCnf</a></div></div></div></div>
_______________________________________________<br>LLVM Developers mailing list<br><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank">LLVMdev@cs.uiuc.edu</a> <a href="http://llvm.cs.uiuc.edu" target="_blank">http://llvm.cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a><br></blockquote></div><br></div></blockquote></div><br><br clear="all"><div><br></div>-- <br>
Best regards,<br>Kuan-Hsu<br><br><br>
</div>