<html><head></head><body><div style="font-family:bookman old style, new york, times, serif;font-size:16px;"><div style="font-family:bookman old style, new york, times, serif;font-size:16px;"><div></div>
<div>Hi Zide,</div><div><br></div><div>thank you for the more detailed clarification of your setup.</div><div>As far as I can see, the only difference between your a) steps and b) steps lies in the sequence of passes you provide them.</div><div>Obviously the sequence of passes you get from [1] is not the same as the sequence of passes opt runs on your main.ll</div><div>I suggest you to use the same command as [1] with the actual input of your benchmark as input for opt. See [2].</div><div><br></div><div>I crafted a main.c file with an empty main and I run your step 1 to generate the LLVM-IR.</div><div>I compared the output of [1] against the output of [2] and I noticed a small difference.</div><div>Indeed, [2] schedules one more pass (-targetpassconfig) with respect to [1].</div><div><br></div><div>I also tried to craft another main.ll file via llvm-stress (LLVM utility to generate random valid LLVM-IR test files).</div><div>In this case [1] and [2] do not differ.</div><div><br></div><div><span><div style="color: rgb(0, 0, 0); font-family: "bookman old style", "new york", times, serif; font-size: 16px;">I guess the difference of scheduled passes is due to source code metadata which have been transferred to the IR from the source file.</div><div style="color: rgb(0, 0, 0); font-family: "bookman old style", "new york", times, serif; font-size: 16px;">I suspect that if you feed [2] with a full benchmark (I don't know which one you are using) you may get a slightly different sequence of passes.</div></span></div><div><br></div><div>I hope this could help you.</div><div><br></div><div>Best regards,</div><div><br></div><div>Stefano Cherubin</div><div><br></div><div><span><div style="color: rgb(0, 0, 0); font-family: "bookman old style", "new york", times, serif; font-size: 16px;"><span style="font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;">[1] llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments</span></div><div style="color: rgb(0, 0, 0); font-family: "bookman old style", "new york", times, serif; font-size: 16px;"><span style="font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;">[2] <span><span style="color: rgb(0, 0, 0); font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; font-size: 16px;">opt -O3 -disable-output -debug-pass=Arguments main.ll</span></span></span></div></span><br></div>
</div><div id="ydp395e77dayahoo_quoted_5036016397" class="ydp395e77dayahoo_quoted">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>
On Saturday, 18 August 2018, 09:16:20 CEST, cszide via llvm-dev <llvm-dev@lists.llvm.org> wrote:
</div>
<div><br></div>
<div><br></div>
<div><div id="ydp395e77dayiv0382511561"><div>Hi Emanuele,<br clear="none">Thank you for your replay!<br clear="none">I cannot replicate -O3 result using LLVM 6.0 with the command you provide. Actually, I previously use the following command<br clear="none">clang -O3 -Xclang -disable-llvm-passes -S -emit-llvm main.c -o main.ll to generate the IR file, which is equal to your command.<br clear="none">Currently, I want to test the passes in LLVM. The performance of the pass or passes sequence is considered, so I choose the performance <br clear="none">of -O3 as a baseline. <br clear="none"><br clear="none">The experiment steps are as following:<br clear="none">1. clang -O3 -Xclang -disable-llvm-passes -S -emit-llvm main.c -o main.ll<br clear="none"><br clear="none">2.a. opt -O3 main.ll -o main-opt1.ll<br clear="none">2.b. opt (the same passes sequence as O3) main.ll -o main-opt2.ll<br clear="none"><br clear="none">3.a llc main-opt1.ll -o main-opt1.s<br clear="none">3.b llc main-opt2.ll -o main-opt2.s<br clear="none"><br clear="none">4.a clang main-opt1.s -o main-opt1<br clear="none">4.b clang main-opt2.s -o main-opt2<br clear="none"><br clear="none">$ time ./main-opt1<br clear="none">real 0m0.846s<br clear="none">user 0m0.845s<br clear="none">sys 0m0.001s<br clear="none"><br clear="none">$ time ./main-opt2 <br clear="none">real 0m0.956s<br clear="none">user 0m0.956s<br clear="none">sys 0m0.001s<br clear="none"><br clear="none">where the same passes sequence is generated by the following command:<br clear="none">llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments<br clear="none"><br clear="none">From the results, we can see the execution time of main-opt2 is 13% lower than that of the main-opt1. <br clear="none">As Stefano said, clang schedules target-independent and target-dependent passes. So I use lli to execute main-opt1.ll and main-opt2.ll<br clear="none">to reduce the influence of target-dependent passes, the results are the same as the above. <br clear="none">$ time lli main-opt1.ll<br clear="none"><br clear="none">real 0m0.878s<br clear="none">user 0m0.878s<br clear="none">sys 0m0.000s<br clear="none"><br clear="none">$ time lli main-opt2.ll<br clear="none"><br clear="none">real 0m0.978s<br clear="none">user 0m0.978s<br clear="none">sys 0m0.000s<br clear="none"><br clear="none">Thus, for my purpose, if I cannot get the same results using -O3 and the passes sequence as -O3, respectively, I cannot say <br clear="none">that the performance comparisons between other passes sequence and -O3 are fair.<br clear="none"><br clear="none">I do not know whether I make some mistakes. <br clear="none"><br clear="none">In addition, I find that the passes sequences "-early-cse-memssa -lcssa-verification -early-cse-memssa", <br clear="none">"-early-cse-memssa -verify -early-cse-memssa", "-early-cse-memssa -demanded-bits -early-cse-memssa" and "-early-cse-memssa -early-cse-memssa" <br clear="none">will cause the following error for LLVM version 6.0.0.<br clear="none">LLVMSymbolizer: error reading file: No such file or directory<br clear="none">#0 0x0000000001a68794 (opt+0x1a68794)<br clear="none">#1 0x0000000001a68a76 (opt+0x1a68a76)<br clear="none">#2 0x00007f96a098c390 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x11390)<br clear="none">#3 0x00000000015fc64e (opt+0x15fc64e)<br clear="none">#4 0x000000000160065d (opt+0x160065d)<br clear="none">#5 0x00000000015fdb08 (opt+0x15fdb08)<br clear="none">#6 0x000000000075aaa6 (opt+0x75aaa6)<br clear="none">#7 0x00007f969f924830 __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x20830)<br clear="none">#8 0x000000000074c1b9 (opt+0x74c1b9)<br clear="none">Stack dump:<br clear="none">0. Program arguments: opt -early-cse-memssa -lcssa-verification -early-cse-memssa main.bc -o main-opt.bc<br clear="none">Segmentation fault (core dumped)<br clear="none"><br clear="none">If possible, please try these passes sequences in your system using LLVM version 5.0.2. If these sequences also cause the same error in your system, <br clear="none">it could be a bug for LLVM.<br clear="none"><br clear="none">Thank you for both your help and your time!<br clear="none"><br clear="none">Best regards<br clear="none">Zide<br clear="none"><br clear="none">At 2018-08-17 23:49:38, "Emanuele Del Sozzo" <Emanuele.DelSozzo@arm.com> wrote:<br clear="none"> <div class="ydp395e77dayiv0382511561yqt0523574704" id="ydp395e77dayiv0382511561yqt63881"><blockquote id="ydp395e77dayiv0382511561isReplyContent" style="PADDING-LEFT:1ex;MARGIN:0px 0px 0px 0.8ex;BORDER-LEFT:#ccc 1px solid;">
<div dir="ltr" id="ydp395e77dayiv0382511561divtagdefaultwrapper" style="font-size:12pt;color:#000000;font-family:Calibri, Helvetica, sans-serif;">
<p style="margin-top:0;margin-bottom:0;">Hi Zide,</p>
<p style="margin-top:0;margin-bottom:0;">I think I found the right way to reach my goal.</p>
<p style="margin-top:0;margin-bottom:0;">I used the following command:</p>
<p style="margin-top:0;margin-bottom:0;">clang -O3 -Xclang -disable-llvm-optzns main.c -S -emit-llvm -o main.ll</p>
<p style="margin-top:0;margin-bottom:0;">to<span style="font-size:12pt;"> generate an IR file enriched by all the metadata that </span><span style="font-size:12pt;">otherwise</span><span style="font-size:12pt;"> </span><span style="font-size:12pt;">wouldn't
be generated with -O0. Moreover, -disable-llvm-optzns flag ensures that none of the optimization passes has been applied yet to the IR.</span></p>
<p style="margin-top:0;margin-bottom:0;">In this way, I can replicate -O3 result by applying the optimization passes using opt. Apparently, those metadata are necessary to fully optimize the code.</p>
<p style="margin-top:0;margin-bottom:0;"><br clear="none">
</p>
<p style="margin-top:0;margin-bottom:0;">I hope that this may help you too.</p>
<p style="margin-top:0;margin-bottom:0;"><br clear="none">
</p>
<p style="margin-top:0;margin-bottom:0;">Best regards</p>
<p style="margin-top:0;margin-bottom:0;">Emanuele Del Sozzo</p>
</div>
<hr style="display:inline-block;width:98%;" tabindex="-1">
<div dir="ltr" id="ydp395e77dayiv0382511561divRplyFwdMsg"><font face="Calibri, sans-serif" style="font-size:11pt;" color="#000000"><b>From:</b> llvm-dev <<a shape="rect" href="mailto:llvm-dev-bounces@lists.llvm.org" rel="nofollow" target="_blank">llvm-dev-bounces@lists.llvm.org</a>> on behalf of Stefano Cherubin via llvm-dev <<a shape="rect" href="mailto:llvm-dev@lists.llvm.org" rel="nofollow" target="_blank">llvm-dev@lists.llvm.org</a>><br clear="none">
<b>Sent:</b> Friday, August 17, 2018 11:44:50 AM<br clear="none">
<b>To:</b> <a shape="rect" href="mailto:llvm-dev@lists.llvm.org" rel="nofollow" target="_blank">llvm-dev@lists.llvm.org</a>; cszide<br clear="none">
<b>Subject:</b> Re: [llvm-dev] Replication -O3 optimizations manually</font>
<div> </div>
</div>
<div>
<div style="font-family:bookman old style, new york, times, serif;font-size:16px;">
<div style="font-family:bookman old style, new york, times, serif;font-size:16px;">
<div></div>
<div>Hi Zide,</div>
<div><br clear="none">
</div>
<div>the scope of opt is limited to the LLVM-IR, which is meant to be always target independent.</div>
<div>In order to apply backend optimizations you need to lower the representation to something closer to the machine-level.</div>
<div>I would suggest you to measure performance on machine code, not LLVM-IR.<br clear="none">
</div>
<div>To this end, please refer to the setup Emanuele is using.</div>
<div><br clear="none">
</div>
<div>However, I may not have properly understood your test.</div>
<div><span>
</span><div style="color:rgb(0,0,0);">
lli is the LLVM-IR interpreter and it is meant more for functional testing rather than performance testing.</div>
Are you comparing the performance of machine code generated by clang -O3 against the performance of lli optimized_IR.bc ?</div>
<div><br clear="none">
</div>
<div>Best regards,</div>
<div><br clear="none">
</div>
<div>Stefano Cherubin</div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
</div>
<div class="ydp395e77dayiv0382511561x_ydpcc5e3842yahoo_quoted" id="ydp395e77dayiv0382511561x_ydpcc5e3842yahoo_quoted_4516679350">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>On Friday, 17 August 2018, 03:55:52 CEST, cszide <<a shape="rect" href="mailto:cszide@163.com" rel="nofollow" target="_blank">cszide@163.com</a>> wrote: </div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
<div>
<div id="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149">
<div>Hi, Stefano<br clear="none">
I also have the problem as described by Emanuele. You say that clang schedules target-independent and target-dependent passes.
<br clear="none">
However, when I use lli to execute bitcode generated by opt with -O3 or with the same optimization passes as -O3, the performance are still different.<br clear="none">
So, are there some special operations by -O3 option? I read the source code of opt, but I cannot find the reason.<br clear="none">
<br clear="none">
Best regards<br clear="none">
Zide<br clear="none">
<br clear="none">
At 2018-08-16 22:13:14, "Stefano Cherubin via llvm-dev" <<a shape="rect" href="mailto:llvm-dev@lists.llvm.org" rel="nofollow" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br clear="none">
<div class="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149yqt4830241314" id="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149yqt99140">
<blockquote id="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149isReplyContent" style="padding-left:1ex;margin:0px 0px 0px 0.8ex;border-left:#ccc 1px solid;">
<div style="font-family:bookman old style, new york, times, serif;font-size:16px;">
<div style="font-family:bookman old style, new york, times, serif;font-size:16px;">
<div style="font-family:bookman old style, new york, times, serif;font-size:16px;">
<div style="font-family:bookman old style, new york, times, serif;font-size:16px;">
<div></div>
<div>Hello Emanuele,</div>
<div><br clear="none">
</div>
<div>When you provide the optimization level -O3 to the clang driver, it does not simply schedule a sequence of passes to be run on the intermediate representation.</div>
<div>Indeed, it schedules target-independent and target-dependent passes.</div>
<div>Moreover, IIRC, the optimization level is also used in the later stages of the code generation to apply target-dependent optimizations (i.e. vectorizer).</div>
<div><br clear="none">
</div>
<div>The most common use case when someone wants to test its own pass/work within the LLVM toolchain is the following<br clear="none">
</div>
<div>- use clang to generate a LLVM-IR file</div>
<div>- use opt to run your desired pass / pass sequence and output another LLVM-IR file</div>
<div>- use clang -O3 to compile to executable machine code</div>
<div><br clear="none">
</div>
<div>However, with this approach you will run the passes on the LLVM-IR twice.</div>
<div>There are use cases when this could invalidate your results.</div>
<div>A<span></span>s opt stops at LLVM-IR level, I would suggest you to use also other LLVM tools to run individually the backend stages / sequence of passes which cannot be run by opt (such as llc / llvm-mc).</div>
<div>An extensive list of tools/commands you can use is available at [0].</div>
<div>For your specific case, I would suggest you to have a look at this restricted schema [1].</div>
<div><br clear="none">
</div>
<div>Yet there is another way to get into even fine grain detail.</div>
<div>You can check which are the clang DriverActions you are running with a given command line. See [2].</div>
<div>From that point you can rebuild the exact whole sequence of commands that the clang driver triggers.</div>
<div><br clear="none">
</div>
<div>If you can provide more details about what is your use case (measure performance, pass development and testing, flag selection, phase ordering), we can suggest the most suitable approach.</div>
<div><br clear="none">
</div>
<div>Kind regards,</div>
<div><br clear="none">
</div>
<div>Stefano Cherubin</div>
<div><br clear="none">
</div>
<div>[0] http://llvm.org/docs/CommandGuide/</div>
<div>[1] https://github.com/skeru/LLVM-intro/blob/master/img/03/toolchain.pdf</div>
<div>[2] https://clang.llvm.org/docs/DriverInternals.html#driver-stages<br clear="none">
</div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
</div>
</div>
</div>
<div class="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149ydp8cb328fayahoo_quoted" id="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149ydp8cb328fayahoo_quoted_5100480480">
<div style="font-family:'Helvetica Neue', Helvetica, Arial, sans-serif;font-size:13px;color:#26282a;">
<div>On Thursday, 16 August 2018, 12:46:04 CEST, Emanuele Del Sozzo via llvm-dev <<a shape="rect" href="mailto:llvm-dev@lists.llvm.org" rel="nofollow" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:
</div>
<div><br clear="none">
</div>
<div><br clear="none">
</div>
<div>
<div id="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149ydp8cb328fayiv4757402819">
<div dir="ltr">
<div dir="ltr" id="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149ydp8cb328fayiv4757402819divtagdefaultwrapper" style="font-size:12pt;color:rgb(0,0,0);font-family:Calibri, Helvetica, sans-serif, EmojiFont, Color UI NotoColorEmoji, UI EmojiSymbols;">
<p style="margin-top:0;margin-bottom:0;"></p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
Hello llvm-dev,</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
my name is Emanuele and I am an intern in ARM. As part of the project I am doing here, I would like to manually replicate the optimizations that LLVM applies when I type -O3. In other words, I would like to know what are the compilation flags/passes that -O3
triggers. </p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
I noticed that GCC reports, on its website, all the flags that are enforced by -O3 (<a shape="rect" class="ydp395e77dayiv0382511561x_ydpcc5e3842yiv3325660149ydp8cb328fayiv4757402819x_OWAAutoLink" href="https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html" rel="nofollow" target="_blank">https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html</a>),
but I wasn't able to find something similar within LLVM documentation. On the other hand, I found that this command displays all the optimization passes applied by opt when -O3 flag is on:</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
llvm-as < /dev/null | opt -O3 -disable-output -debug-pass=Arguments</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
I tried to apply the same optimization passes through opt, but, even though the performance are similar, the resulting binary is slower than the one generated using -O3 (also the binaries differ, of course).</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
Again, I found this other command that does something similar (it lists the sequence of optimization passes applied):</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
clang -O3 -mllvm -debug-pass=Arguments file.c </p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
In this case, the performance are still different and some of the optimization passes listed in the last block of passes (e.g. -machinemoduleinfo, -stack-protector, etc.) are unknown to opt.</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
<br clear="none">
</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
Said that, my question is: how can I find out what optimization passes/flags -O3 enforces in order to manually apply the same optimizations and have, hopefully, the same binary and performance?</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
<br clear="none">
</p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
I am currently using <span style="font-size:12pt;">LLVM version 5.0.2.</span></p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
<span style="font-size:12pt;"><br clear="none">
</span></p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
<span style="font-size:12pt;">Thank you for both your help and your time!</span></p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
<span style="font-size:12pt;"><br clear="none">
</span></p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
<span style="font-size:12pt;">Best regards</span></p>
<p style="font-family:Calibri, Helvetica, sans-serif, serif, EmojiFont;font-size:16px;">
<span style="font-size:12pt;">Emanuele</span></p>
<br clear="none">
<p></p>
</div>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose,
or store or copy the information in any medium. Thank you. </div>
</div>
_______________________________________________<br clear="none">
LLVM Developers mailing list<br clear="none">
<a shape="rect" href="mailto:llvm-dev@lists.llvm.org" rel="nofollow" target="_blank">llvm-dev@lists.llvm.org</a><br clear="none">
<a shape="rect" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="nofollow" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br clear="none">
</div>
</div>
</div>
</div>
</blockquote>
</div>
<br clear="none">
<br clear="none">
<span title="neteasefooter"></span>
<p> </p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose,
or store or copy the information in any medium. Thank you.
</blockquote></div><br clear="none"><br clear="none"><span title="neteasefooter"></span><p> </p></div></div><div class="ydp395e77dayqt0523574704" id="ydp395e77dayqt04654">_______________________________________________<br clear="none">LLVM Developers mailing list<br clear="none"><a shape="rect" href="mailto:llvm-dev@lists.llvm.org" rel="nofollow" target="_blank">llvm-dev@lists.llvm.org</a><br clear="none"><a shape="rect" href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="nofollow" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br clear="none"></div></div>
</div>
</div></div></body></html>