<div dir="ltr"><div class="gmail_quote"><div dir="ltr">On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="PL" link="#0563C1" vlink="#954F72">
<div class="m_6011475852250102200WordSection1">
<p class="MsoNormal">Hello,<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><span lang="EN-US">I found one case where narrowing switch instructions transformation in InstCombine produces worse code.
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Let's suppose that I have such code:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">$ cat a.c <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void foo();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void bar();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void zoo();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void my_func(unsigned int a) {<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> unsigned char b = a & 0xF;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> switch (b) {<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 0: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 1: bar(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 2: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 3: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 4: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 5: bar(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 6: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 7: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 8: bar(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 9: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 10: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> default: zoo();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> }<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">}<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Using recent clang:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal">$ clang -O3 -S -c a.c -o a.s<u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-US">I have the following assembly in the beginning of my_func:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"># bad case<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> movl %edi, %eax<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> andb $15, %al<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> cmpb $10, %al<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> ja .LBB0_9 # jump to the default case<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> andl $15, %edi<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> jmpq *.LJTI0_0(,%rdi,8) # go to jump table<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I found that if I disable switch shrinking like shown below:
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">$ git diff<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">index 5d5a9b2..3682b88 100644<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">--- a/lib/Transforms/InstCombine/InstructionCombining.cpp<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">@@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> return &SI;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> }<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">+ return nullptr;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">+<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> KnownBits Known = computeKnownBits(Cond, 0, &SI);<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> unsigned LeadingKnownZeros = Known.countMinLeadingZeros();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> unsigned LeadingKnownOnes = Known.countMinLeadingOnes();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I get better assembly (there is no additional MOV and AND instructions):<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"># good case<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> </span>andl $15, %edi<u></u><u></u></p>
<p class="MsoNormal"> cmpl $10, %edi<u></u><u></u></p>
<p class="MsoNormal"> ja .LBB0_9<u></u><u></u></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"> <span lang="EN-US">jmpq *.LJTI0_0(,%rdi,8)<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
<p class="MsoNormal"><span lang="EN-US">This transformation was introduced in the commit:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">commit 4eb03123dfda2de88a84852834845678833c8c36<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Author: Akira Hatanaka <<a href="mailto:ahatanaka@apple.com" target="_blank">ahatanaka@apple.com</a>><u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Date: Thu Oct 16 06:00:46 2014 +0000<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> Reapply r219832 - InstCombine: Narrow switch instructions using known bits.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The
second is in i64, which we use for calculating address in the jump table. In the good case they were combined.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> </span></p></div></div></blockquote><div><br></div><div>I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still necessary today.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="PL" link="#0563C1" vlink="#954F72"><div class="m_6011475852250102200WordSection1"><p class="MsoNormal"><span lang="EN-US"><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Best regards,<br>
Denis Bakhvalov.<u></u><u></u></span></p>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
</div>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div></div>