<div dir="ltr">Have -O0 on your clang command line causes all functions to get marked with an 'optnone' attribute that prevents opt from being able to optimize them later. You should also add "-Xclang -<span class="gmail-s1">disable-O0-</span><span class="gmail-s2">optnone" to your command line.</span></div><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature">~Craig</div></div>
<br><div class="gmail_quote">On Thu, Sep 21, 2017 at 10:04 PM, Yi Lin via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>
<br>
I am trying to understand the effectiveness of various llvm optimisations when a language targets llvm (or C) as its backend.<br>
<br>
The following is my approach (please correct me if I did anything wrong):<br>
<br>
I am trying to explicitly control the optimisations passes in llvm. I disable optimisation in clang, but instead emit unoptimized llvm IR, and use opt to optimise that. These are what I do:<br>
<br>
* clang -O0 -S -mllvm -disable-llvm-optzns -emit-llvm -momit-leaf-frame-pointer a.c -o a.ll<br>
* opt -(PASSES) a.ll -o a.bc<br>
* llc a.bc -filetype=obj -o a.o<br>
<br>
To evaluate the effectiveness of optimisation passes, I started with an 'add-one-in' approach. The baseline is no optimisations passes, and I iterate through all the O1 passes and explicitly allow one pass for each run. I didnt try understand those passes so it is a black box test. This will show how effective each single optimisation is (ignore correlation of passes). This can be iterative, e.g. identify the most effecitve pass, and always enable it, and then 'add-one-in' for the rest passes. I also plan to take a 'leave-one-out' approach as well, in which the baseline is all optimisations enabled, and one pass will be disabled at a time.<br>
<br>
Here is the result for the 'add-one-in' approach on some micro benchmarks:<br>
<br>
<a href="https://drive.google.com/drive/folders/0B9EKhGby1cv9YktaS3NxUVg2Zk0" rel="noreferrer" target="_blank">https://drive.google.com/drive<wbr>/folders/0B9EKhGby1cv9YktaS3Nx<wbr>UVg2Zk0</a><br>
<br>
The result seems a bit surprising. A few passes, such as licm, sroa, instcombine and mem2reg, seem to deliver a very close performance as O1 (which includes all the passes). Figure 7 is an example. If my methodology is correct, then my guess is those optimisations may require some common internal passes, which actually deliver most of the improvements. I am wondering if this is true.<br>
<br>
Any suggestion or critiques are welcome.<br>
<br>
Thanks,<br>
Yi<br>
<br>
______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
</blockquote></div><br></div>