[PATCH] D63505: [InstCombine] Fold icmp eq/ne (and %x, ~C), 0 -> %x u</u>= 0 earlier, C+1 is power of 2.

Tue Jun 18 14:51:53 PDT 2019

huihuiz marked an inline comment as done.
huihuiz added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:1656-1657
+    // ((X & ~7) == 0) --> X u< 8
+    // If X is (BinOp Y, C3), allow other rules to fold C3 with C2.
+    if (!match(X, m_c_BinOp(m_Value(), m_Constant())) &&
+        (~(*C2) + 1).isPowerOf2()) {
----------------
lebedev.ri wrote:
> huihuiz wrote:
> > lebedev.ri wrote:
> > > lebedev.ri wrote:
> > > > huihuiz wrote:
> > > > > lebedev.ri wrote:
> > > > > > huihuiz wrote:
> > > > > > > huihuiz wrote:
> > > > > > > > lebedev.ri wrote:
> > > > > > > > > Why do we care about that? Seems rather arbitrary.
> > > > > > > > take this test as example
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > > > >   %shlp = shl i8 %p, 5
> > > > > > > >   %andp = and i8 %shlp, -64
> > > > > > > >   %cmp = icmp ult i8 %andp, 32
> > > > > > > >   ret i1 %cmp
> > > > > > > > }
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > we do miss fold for (X >> C3) & C2 != C1 --> (X & (C2 << C3)) != (C1 << C3)
> > > > > > > for this test, should be better transforming into this 
> > > > > > > 
> > > > > > > ```
> > > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > > >   %andp = and i8 %p, 6
> > > > > > >   %cmp = icmp eq i8 %andp, 0
> > > > > > >   ret i1 %cmp
> > > > > > > }
> > > > > > > 
> > > > > > > ```
> > > > > > > 
> > > > > > > However, by scheduling ((%x & C) == 0) --> %x u< (-C)  iff (-C) is power of two earlier,
> > > > > > > we end up with
> > > > > > > 
> > > > > > > ```
> > > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > > >   %shlp = shl i8 %p, 5
> > > > > > >   %cmp = icmp ult i8 %shlp, 64
> > > > > > >   ret i1 %cmp
> > > > > > > }
> > > > > > > 
> > > > > > > ```
> > > > > > > 
> > > > > > Is that IR without this restriction?
> > > > > > We currently seem to handle it just fine https://godbolt.org/z/8DB1xD
> > > > > Yes , that is the IR without add this restriction.
> > > > > We are currently handle it fine.
> > > > > But when we try to schedule (X & ~C) ==/!= 0 --> X u</u>= C+1 earlier
> > > > > we end up with 
> > > > > 
> > > > > ```
> > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > >   %shlp = shl i8 %p, 5
> > > > >   %cmp = icmp ult i8 %shlp, 64
> > > > >   ret i1 %cmp
> > > > > }
> > > > > ```
> > > > Well, nice find.
> > > > Like i guessed this rescheduling is exposing all kinds of missing folds :)
> > > Can you give me a spoiler, if you drop this restriction, are there any other regressions in `check-llvm`?
> > Here are the failing tests when dropping this restriction:
> > 
> > 1. test file: test/Transforms/InstCombine/pr17827.ll
> > test_shift_and_cmp_changed2 and its vector version
> > 
> > 2. test file: test/Transforms/InstCombine/lshr-and-negC-icmpeq-zero.ll
> > scalar_i32_lshr_and_negC_eq_X_is_constant1
> > and 
> > scalar_i32_lshr_and_negC_eq_X_is_constant1
> > 
> > 
> > ```
> >  define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant1(i32 %y) {
> >  ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant1(
> >  ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 12345, [[Y:%.*]]
> > -; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
> > -; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
> > +; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
> >  ; CHECK-NEXT:    ret i1 [[R]]
> >  ;
> >    %lshr = lshr i32 12345, %y
> >    %and = and i32 %lshr, 4294967288  ; ~7
> >    %r = icmp eq i32 %and, 0
> >    ret i1 %r
> >  }
> > 
> >  define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant2(i32 %y) {
> >  ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant2(
> >  ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 268435456, [[Y:%.*]]
> > -; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
> > -; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
> > +; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
> >  ; CHECK-NEXT:    ret i1 [[R]]
> >  ;
> >    %lshr = lshr i32 268435456, %y
> >    %and = and i32 %lshr, 4294967288  ; ~7
> >    %r = icmp eq i32 %and, 0
> >    ret i1 %r
> >  }
> > 
> > ```
> > 
> > 3. similarly for test/Transforms/InstCombine/shl-and-negC-icmpeq-zero.ll
> I'm confused, 2. and 3. are improvements, right?
> So only `test_shift_and_cmp_changed2` regresses?
> A missing fold should simply be added then, i'd say.
> I'm not sure what it is yet though.
test file 2 and 3 are improvements. But the improved test cases are created mostly to check for correctness, I am not sure the improved test cases are actually popular in real applications.

test file 1 is regression, the failing test is simplified from bit filed test bug in pr17827. Probably more prevalent in application code.

Adding restriction tend to benefit more cases.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63505/new/

https://reviews.llvm.org/D63505