<div dir="ltr">I don't have an IR that you can compile but the original IR we saw was something like this:<div><br></div><div><div> %1 = trunc i128 %0 to i64</div><div> %2 = zext i64 %1 to i128</div><div> switch i128 %2, label %3 [</div><div> i128 0, label %15</div><div> i128 2, label %15</div><div> i128 4, label %15</div><div> ]</div><div><br></div></div><div>When I tested the patch four years ago, the optimization did reduce code size on x86 in some cases, but it's possible that this optimization isn't needed anymore today.</div><div class="gmail-yj6qo gmail-ajU" style="outline:none;padding:10px 0px;width:22px;margin:2px 0px 0px"><br class="gmail-Apple-interchange-newline"></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, Oct 23, 2018 at 10:01 AM Bakhvalov, Denis <<a href="mailto:denis.bakhvalov@intel.com">denis.bakhvalov@intel.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="PL" link="blue" vlink="purple">
<div class="m_-1019351003208497678WordSection1">
<p class="MsoNormal"><a name="m_-1019351003208497678__MailEndCompose"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Hi Akira,<br>
<br>
</span></a><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Can you maybe remember or come up with any example where this transformation still helps today?<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">If no such case and no objections on disabling/removing it, I can start working on that.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d">Best regards,<br>
Denis Bakhvalov.<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><a name="m_-1019351003208497678______replyseparator"></a><b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Akira Hatanaka [mailto:<a href="mailto:ahatanak@gmail.com" target="_blank">ahatanak@gmail.com</a>]
<br>
<b>Sent:</b> Wednesday, October 10, 2018 12:46 PM<br>
<b>To:</b> Bakhvalov, Denis <<a href="mailto:denis.bakhvalov@intel.com" target="_blank">denis.bakhvalov@intel.com</a>><br>
<b>Cc:</b> llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>>; Akira Hatanaka <<a href="mailto:ahatanaka@apple.com" target="_blank">ahatanaka@apple.com</a>><br>
<b>Subject:</b> Re: [llvm-dev] InstCombine: Narrow switch instructions using known bits<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
<div>
<div>
<div>
<p class="MsoNormal"><span lang="EN-US">On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <</span><a href="mailto:llvm-dev@lists.llvm.org" target="_blank"><span lang="EN-US">llvm-dev@lists.llvm.org</span></a><span lang="EN-US">> wrote:<u></u><u></u></span></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal"><span lang="EN-US">Hello,<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I found one case where narrowing switch instructions transformation in InstCombine produces worse code.
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Let's suppose that I have such code:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">$ cat a.c
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void foo();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void bar();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void zoo();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">void my_func(unsigned int a) {<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> unsigned char b = a & 0xF;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> switch (b) {<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 0: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 1: bar(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 2: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 3: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 4: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 5: bar(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 6: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 7: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 8: bar(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 9: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> case 10: foo(); break;<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> default: zoo();<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> }<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">}<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">Using recent clang:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal">$ clang -O3 -S -c a.c -o a.s<u></u><u></u></p>
<p class="MsoNormal"><span lang="EN-US">I have the following assembly in the beginning of my_func:<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"># bad case<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> movl %edi, %eax<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> andb $15, %al<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> cmpb $10, %al<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> ja .LBB0_9 # jump to the default case<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> andl $15, %edi<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US"> jmpq *.LJTI0_0(,%rdi,8) # go to jump table<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">
<u></u><u></u></span></p>
<p class="MsoNormal"><span lang="EN-US">I found that if I disable switch shrinking like shown below:
<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">$ git diff<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp b/lib/Transforms/InstCombine/InstructionCombining.cpp<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">index 5d5a9b2..3682b88 100644<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">--- a/lib/Transforms/InstCombine/InstructionCombining.cpp<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">+++ b/lib/Transforms/InstCombine/InstructionCombining.cpp<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">@@ -2429,6 +2429,8 @@ Instruction *InstCombiner::visitSwitchInst(SwitchInst &SI) {<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> return &SI;<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> }<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">+ return nullptr;<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">+<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> KnownBits Known = computeKnownBits(Cond, 0, &SI);<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> unsigned LeadingKnownZeros = Known.countMinLeadingZeros();<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> unsigned LeadingKnownOnes = Known.countMinLeadingOnes();<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">I get better assembly (there is no additional MOV and AND instructions):<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"># good case<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> andl $15, %edi<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> cmpl $10, %edi<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> </span>ja .LBB0_9<u></u><u></u></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<u></u><u></u></p>
<p class="MsoNormal" style="margin-left:11.55pt">
jmpq *.LJTI0_0(,%rdi,8)<u></u><u></u></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<u></u><u></u></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">This transformation was introduced in the commit:<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">commit 4eb03123dfda2de88a84852834845678833c8c36<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">Author: Akira Hatanaka <</span><a href="mailto:ahatanaka@apple.com" target="_blank"><span lang="EN-US">ahatanaka@apple.com</span></a><span lang="EN-US">><u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">Date: Thu Oct 16 06:00:46 2014 +0000<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> Reapply r219832 - InstCombine: Narrow switch instructions using known bits.<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">From IR point of view, after the ‘opt –codegenprepare’ the difference is that in good case we have simple ‘and’ operation (all calculations are made in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">During instruction selection we expand switch into a jump table. In the bad case we use 2 copies of the value that we are switching on. First is in i8 that we use to determine whether we should jump to default case. The second is in i64,
which we use for calculating address in the jump table. In the good case they were combined.<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">But there is still one thing that I don’t understand. What is the bad case that this transformation (narrowing switch instructions) was supposed to fix, i.e. does this transformation still make sense?<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><span lang="EN-US"><u></u> <u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US">I couldn't find the original test case, but the patch was committed to fix a switch over a 128-bit value that was causing llvm to generate suboptimal code in some cases. I'm not sure whether this optimization is still
necessary today.<u></u><u></u></span></p>
</div>
<div>
<p class="MsoNormal"><span lang="EN-US"> <u></u><u></u></span></p>
</div>
<blockquote style="border:none;border-left:solid #cccccc 1.0pt;padding:0in 0in 0in 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US">Best regards,<br>
Denis Bakhvalov.<u></u><u></u></span></p>
<p class="MsoNormal" style="margin-left:11.55pt">
<span lang="EN-US"> <u></u><u></u></span></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:11.55pt"><span lang="EN-US">_______________________________________________<br>
LLVM Developers mailing list<br>
</span><a href="mailto:llvm-dev@lists.llvm.org" target="_blank"><span lang="EN-US">llvm-dev@lists.llvm.org</span></a><span lang="EN-US"><br>
</span><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" target="_blank"><span lang="EN-US">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</span></a><span lang="EN-US"><u></u><u></u></span></p>
</blockquote>
</div>
</div>
</div>
</div>
</blockquote></div>