[llvm-dev] InstCombine: Narrow switch instructions using known bits

Tue Oct 23 14:15:11 PDT 2018

I don't have an IR that you can compile but the original IR we saw was
something like this:

  %1 = trunc i128 %0 to i64
  %2 = zext i64 %1 to i128
  switch i128 %2, label %3 [
    i128 0, label %15
    i128 2, label %15
    i128 4, label %15
  ]

When I tested the patch four years ago, the optimization did reduce code
size on x86 in some cases, but it's possible that this optimization isn't
needed anymore today.

On Tue, Oct 23, 2018 at 10:01 AM Bakhvalov, Denis <denis.bakhvalov at intel.com>
wrote:

> Hi Akira,
>
> Can you maybe remember or come up with any example where this
> transformation still helps today?
>
> If no such case and no objections on disabling/removing it, I can start
> working on that.
>
>
>
> Best regards,
> Denis Bakhvalov.
>
>
>
> *From:* Akira Hatanaka [mailto:ahatanak at gmail.com]
> *Sent:* Wednesday, October 10, 2018 12:46 PM
> *To:* Bakhvalov, Denis <denis.bakhvalov at intel.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; Akira Hatanaka <
> ahatanaka at apple.com>
> *Subject:* Re: [llvm-dev] InstCombine: Narrow switch instructions using
> known bits
>
>
>
> On Fri, Sep 21, 2018 at 7:02 AM Bakhvalov, Denis via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hello,
>
>
>
> I found one case where narrowing switch instructions transformation in
> InstCombine produces worse code.
>
> Let's suppose that I have such code:
>
>
>
> $ cat a.c
>
> void foo();
>
> void bar();
>
> void zoo();
>
>
>
> void my_func(unsigned int a) {
>
>     unsigned char b = a & 0xF;
>
>     switch (b) {
>
>         case 0:  foo(); break;
>
>         case 1:  bar(); break;
>
>         case 2:  foo(); break;
>
>         case 3:  foo(); break;
>
>         case 4:  foo(); break;
>
>         case 5:  bar(); break;
>
>         case 6:  foo(); break;
>
>         case 7:  foo(); break;
>
>         case 8:  bar(); break;
>
>         case 9:  foo(); break;
>
>         case 10: foo(); break;
>
>
>
>         default: zoo();
>
>     }
>
> }
>
>
>
> Using recent clang:
>
>
>
> $ clang -O3 -S -c a.c -o a.s
>
> I have the following assembly in the beginning of my_func:
>
> # bad case
>
>         movl    %edi, %eax
>
>         andb    $15, %al
>
>         cmpb    $10, %al
>
>         ja      .LBB0_9                           # jump to the default
> case
>
>
>
>         andl    $15, %edi
>
>         jmpq    *.LJTI0_0(,%rdi,8)      # go to jump table
>
>
>
> I found that if I disable switch shrinking like shown below:
>
>
>
> $ git diff
>
> diff --git a/lib/Transforms/InstCombine/InstructionCombining.cpp
> b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> index 5d5a9b2..3682b88 100644
>
> --- a/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> +++ b/lib/Transforms/InstCombine/InstructionCombining.cpp
>
> @@ -2429,6 +2429,8 @@ Instruction
> *InstCombiner::visitSwitchInst(SwitchInst &SI) {
>
>      return &SI;
>
>    }
>
>
>
> +  return nullptr;
>
> +
>
>    KnownBits Known = computeKnownBits(Cond, 0, &SI);
>
>    unsigned LeadingKnownZeros = Known.countMinLeadingZeros();
>
>    unsigned LeadingKnownOnes = Known.countMinLeadingOnes();
>
>
>
> I get better assembly (there is no additional MOV and AND instructions):
>
> # good case
>
>         andl    $15, %edi
>
>         cmpl    $10, %edi
>
>         ja      .LBB0_9
>
>
>
>         jmpq    *.LJTI0_0(,%rdi,8)
>
>
>
> This transformation was introduced in the commit:
>
> commit 4eb03123dfda2de88a84852834845678833c8c36
>
> Author: Akira Hatanaka <ahatanaka at apple.com>
>
> Date:   Thu Oct 16 06:00:46 2014 +0000
>
>     Reapply r219832 - InstCombine: Narrow switch instructions using known
> bits.
>
>
>
> From IR point of view, after the ‘opt –codegenprepare’ the difference is
> that in good case we have simple ‘and’ operation (all calculations are made
> in i32). In the bad case there is 'trunc' to i4 and then ‘zext’ to i8.
>
>
>
> During instruction selection we expand switch into a jump table. In the
> bad case we use 2 copies of the value that we are switching on. First is in
> i8 that we use to determine whether we should jump to default case. The
> second is in i64, which we use for calculating address in the jump table.
> In the good case they were combined.
>
>
>
> But there is still one thing that I don’t understand. What is the bad case
> that this transformation (narrowing switch instructions) was supposed to
> fix, i.e. does this transformation still make sense?
>
>
>
>
>
> I couldn't find the original test case, but the patch was committed to fix
> a switch over a 128-bit value that was causing llvm to generate suboptimal
> code in some cases. I'm not sure whether this optimization is still
> necessary today.
>
>
>
> Best regards,
> Denis Bakhvalov.
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181023/38188ed6/attachment.html>