[PATCH] D36498: [InstCombine] Teach foldSelectICmpAnd to recognize a (icmp slt trunc X, 0) and (icmp sgt trunc X, -1) as equivalent to an and with the sign bit of the truncated type

Tue Aug 15 07:27:01 PDT 2017

spatel added a comment.

In https://reviews.llvm.org/D36498#841022, @craig.topper wrote:

> This patch is really just making InstCombine self consistent. We currently optimize this case differently depending on whether i8 is legal in datalayout.
>
> define i32 @test71(i32 %x) {
>  ; CHECK-LABEL: @test71(
>  ; CHECK-NEXT:    [[TMP1:%.*]] = lshr i32 [[X:%.*]], 6
>  ; CHECK-NEXT:    [[TMP2:%.*]] = and i32 [[TMP1]], 2
>  ; CHECK-NEXT:    [[TMP3:%.*]] = or i32 [[TMP2]], 40
>  ; CHECK-NEXT:    ret i32 [[TMP3]]
>  ;
>
>   %1 = and i32 %x, 128
>   %2 = icmp eq i32 %1, 0
>   %3 = select i1 %2, i32 40, i32 42
>   ret i32 %3
>
> }
>
> If we want to remove foldSelectICmpAnd that's a different question.

Ah, I didn't recognize what was going on. This is a sibling to https://reviews.llvm.org/D22537. Can you include a test that has a trunc in it from the start, so we are not dependent on the other combine? A code comment to show the complete transform would also make it a bit clearer for me.

FWIW, test71 is converted to math in the x86 backend for all 3 possibilities, but this doesn't happen for AArch64 or PPC where it's also likely a win. And for x86, it's different asm in all 3 cases:

With mask+cmp+sel:

  %1 = and i32 %x, 128
  %2 = icmp eq i32 %1, 0
  %3 = select i1 %2, i32 40, i32 42
  ret i32 %3
  -->
  andl	$128, %edi
  shrl	$6, %edi
  leal	40(%rdi), %eax

With shift+mask+or:

  %1 = lshr i32 %x, 6
  %2 = and i32 %1, 2
  %3 = or i32 %2, 40
  -->
  shrl	$6, %edi
  andl	$2, %edi
  leal	40(%rdi), %eax

With trunc+cmp+sel

  %1 = trunc i32 %x to i8
  %2 = icmp sgt i8 %1, -1
  %3 = select i1 %2, i32 40, i32 42
  -->
  xorl	%eax, %eax
  testb	%dil, %dil
  sets	%al
  leal	40(%rax,%rax), %eax

https://reviews.llvm.org/D36498