[PATCH] D63505: [InstCombine] Fold icmp eq/ne (and %x, ~C), 0 -> %x u</u>= 0 earlier, C+1 is power of 2.

Tue Jun 18 14:40:53 PDT 2019

lebedev.ri added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:1656-1657
+    // ((X & ~7) == 0) --> X u< 8
+    // If X is (BinOp Y, C3), allow other rules to fold C3 with C2.
+    if (!match(X, m_c_BinOp(m_Value(), m_Constant())) &&
+        (~(*C2) + 1).isPowerOf2()) {
----------------
huihuiz wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > huihuiz wrote:
> > > > lebedev.ri wrote:
> > > > > huihuiz wrote:
> > > > > > huihuiz wrote:
> > > > > > > lebedev.ri wrote:
> > > > > > > > Why do we care about that? Seems rather arbitrary.
> > > > > > > take this test as example
> > > > > > > 
> > > > > > > 
> > > > > > > ```
> > > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > > >   %shlp = shl i8 %p, 5
> > > > > > >   %andp = and i8 %shlp, -64
> > > > > > >   %cmp = icmp ult i8 %andp, 32
> > > > > > >   ret i1 %cmp
> > > > > > > }
> > > > > > > 
> > > > > > > ```
> > > > > > > 
> > > > > > > we do miss fold for (X >> C3) & C2 != C1 --> (X & (C2 << C3)) != (C1 << C3)
> > > > > > for this test, should be better transforming into this 
> > > > > > 
> > > > > > ```
> > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > >   %andp = and i8 %p, 6
> > > > > >   %cmp = icmp eq i8 %andp, 0
> > > > > >   ret i1 %cmp
> > > > > > }
> > > > > > 
> > > > > > ```
> > > > > > 
> > > > > > However, by scheduling ((%x & C) == 0) --> %x u< (-C)  iff (-C) is power of two earlier,
> > > > > > we end up with
> > > > > > 
> > > > > > ```
> > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > >   %shlp = shl i8 %p, 5
> > > > > >   %cmp = icmp ult i8 %shlp, 64
> > > > > >   ret i1 %cmp
> > > > > > }
> > > > > > 
> > > > > > ```
> > > > > > 
> > > > > Is that IR without this restriction?
> > > > > We currently seem to handle it just fine https://godbolt.org/z/8DB1xD
> > > > Yes , that is the IR without add this restriction.
> > > > We are currently handle it fine.
> > > > But when we try to schedule (X & ~C) ==/!= 0 --> X u</u>= C+1 earlier
> > > > we end up with 
> > > > 
> > > > ```
> > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > >   %shlp = shl i8 %p, 5
> > > >   %cmp = icmp ult i8 %shlp, 64
> > > >   ret i1 %cmp
> > > > }
> > > > ```
> > > Well, nice find.
> > > Like i guessed this rescheduling is exposing all kinds of missing folds :)
> > Can you give me a spoiler, if you drop this restriction, are there any other regressions in `check-llvm`?
> Here are the failing tests when dropping this restriction:
> 
> 1. test file: test/Transforms/InstCombine/pr17827.ll
> test_shift_and_cmp_changed2 and its vector version
> 
> 2. test file: test/Transforms/InstCombine/lshr-and-negC-icmpeq-zero.ll
> scalar_i32_lshr_and_negC_eq_X_is_constant1
> and 
> scalar_i32_lshr_and_negC_eq_X_is_constant1
> 
> 
> ```
>  define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant1(i32 %y) {
>  ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant1(
>  ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 12345, [[Y:%.*]]
> -; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
> -; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
> +; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
>  ; CHECK-NEXT:    ret i1 [[R]]
>  ;
>    %lshr = lshr i32 12345, %y
>    %and = and i32 %lshr, 4294967288  ; ~7
>    %r = icmp eq i32 %and, 0
>    ret i1 %r
>  }
> 
>  define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant2(i32 %y) {
>  ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant2(
>  ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 268435456, [[Y:%.*]]
> -; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
> -; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
> +; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
>  ; CHECK-NEXT:    ret i1 [[R]]
>  ;
>    %lshr = lshr i32 268435456, %y
>    %and = and i32 %lshr, 4294967288  ; ~7
>    %r = icmp eq i32 %and, 0
>    ret i1 %r
>  }
> 
> ```
> 
> 3. similarly for test/Transforms/InstCombine/shl-and-negC-icmpeq-zero.ll
I'm confused, 2. and 3. are improvements, right?
So only `test_shift_and_cmp_changed2` regresses?
A missing fold should simply be added then, i'd say.
I'm not sure what it is yet though.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63505/new/

https://reviews.llvm.org/D63505