[PATCH] D63505: [InstCombine] Fold icmp eq/ne (and %x, ~C), 0 -> %x u</u>= 0 earlier, C+1 is power of 2.

Tue Jun 18 15:11:29 PDT 2019

lebedev.ri added inline comments.

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineCompares.cpp:1656-1657
+    // ((X & ~7) == 0) --> X u< 8
+    // If X is (BinOp Y, C3), allow other rules to fold C3 with C2.
+    if (!match(X, m_c_BinOp(m_Value(), m_Constant())) &&
+        (~(*C2) + 1).isPowerOf2()) {
----------------
huihuiz wrote:
> lebedev.ri wrote:
> > huihuiz wrote:
> > > lebedev.ri wrote:
> > > > lebedev.ri wrote:
> > > > > huihuiz wrote:
> > > > > > lebedev.ri wrote:
> > > > > > > huihuiz wrote:
> > > > > > > > huihuiz wrote:
> > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > Why do we care about that? Seems rather arbitrary.
> > > > > > > > > take this test as example
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > ```
> > > > > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > > > > >   %shlp = shl i8 %p, 5
> > > > > > > > >   %andp = and i8 %shlp, -64
> > > > > > > > >   %cmp = icmp ult i8 %andp, 32
> > > > > > > > >   ret i1 %cmp
> > > > > > > > > }
> > > > > > > > > 
> > > > > > > > > ```
> > > > > > > > > 
> > > > > > > > > we do miss fold for (X >> C3) & C2 != C1 --> (X & (C2 << C3)) != (C1 << C3)
> > > > > > > > for this test, should be better transforming into this 
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > > > >   %andp = and i8 %p, 6
> > > > > > > >   %cmp = icmp eq i8 %andp, 0
> > > > > > > >   ret i1 %cmp
> > > > > > > > }
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > However, by scheduling ((%x & C) == 0) --> %x u< (-C)  iff (-C) is power of two earlier,
> > > > > > > > we end up with
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > > > >   %shlp = shl i8 %p, 5
> > > > > > > >   %cmp = icmp ult i8 %shlp, 64
> > > > > > > >   ret i1 %cmp
> > > > > > > > }
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > 
> > > > > > > Is that IR without this restriction?
> > > > > > > We currently seem to handle it just fine https://godbolt.org/z/8DB1xD
> > > > > > Yes , that is the IR without add this restriction.
> > > > > > We are currently handle it fine.
> > > > > > But when we try to schedule (X & ~C) ==/!= 0 --> X u</u>= C+1 earlier
> > > > > > we end up with 
> > > > > > 
> > > > > > ```
> > > > > > define i1 @test_shift_and_cmp_changed2(i8 %p) {
> > > > > >   %shlp = shl i8 %p, 5
> > > > > >   %cmp = icmp ult i8 %shlp, 64
> > > > > >   ret i1 %cmp
> > > > > > }
> > > > > > ```
> > > > > Well, nice find.
> > > > > Like i guessed this rescheduling is exposing all kinds of missing folds :)
> > > > Can you give me a spoiler, if you drop this restriction, are there any other regressions in `check-llvm`?
> > > Here are the failing tests when dropping this restriction:
> > > 
> > > 1. test file: test/Transforms/InstCombine/pr17827.ll
> > > test_shift_and_cmp_changed2 and its vector version
> > > 
> > > 2. test file: test/Transforms/InstCombine/lshr-and-negC-icmpeq-zero.ll
> > > scalar_i32_lshr_and_negC_eq_X_is_constant1
> > > and 
> > > scalar_i32_lshr_and_negC_eq_X_is_constant1
> > > 
> > > 
> > > ```
> > >  define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant1(i32 %y) {
> > >  ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant1(
> > >  ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 12345, [[Y:%.*]]
> > > -; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
> > > -; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
> > > +; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
> > >  ; CHECK-NEXT:    ret i1 [[R]]
> > >  ;
> > >    %lshr = lshr i32 12345, %y
> > >    %and = and i32 %lshr, 4294967288  ; ~7
> > >    %r = icmp eq i32 %and, 0
> > >    ret i1 %r
> > >  }
> > > 
> > >  define i1 @scalar_i32_lshr_and_negC_eq_X_is_constant2(i32 %y) {
> > >  ; CHECK-LABEL: @scalar_i32_lshr_and_negC_eq_X_is_constant2(
> > >  ; CHECK-NEXT:    [[LSHR:%.*]] = lshr i32 268435456, [[Y:%.*]]
> > > -; CHECK-NEXT:    [[AND:%.*]] = and i32 [[LSHR]], -8
> > > -; CHECK-NEXT:    [[R:%.*]] = icmp eq i32 [[AND]], 0
> > > +; CHECK-NEXT:    [[R:%.*]] = icmp ult i32 [[LSHR]], 8
> > >  ; CHECK-NEXT:    ret i1 [[R]]
> > >  ;
> > >    %lshr = lshr i32 268435456, %y
> > >    %and = and i32 %lshr, 4294967288  ; ~7
> > >    %r = icmp eq i32 %and, 0
> > >    ret i1 %r
> > >  }
> > > 
> > > ```
> > > 
> > > 3. similarly for test/Transforms/InstCombine/shl-and-negC-icmpeq-zero.ll
> > I'm confused, 2. and 3. are improvements, right?
> > So only `test_shift_and_cmp_changed2` regresses?
> > A missing fold should simply be added then, i'd say.
> > I'm not sure what it is yet though.
> test file 2 and 3 are improvements. But the improved test cases are created mostly to check for correctness, I am not sure the improved test cases are actually popular in real applications.
> 
> test file 1 is regression, the failing test is simplified from bit filed test bug in pr17827. Probably more prevalent in application code.
> 
> Adding restriction tend to benefit more cases.
(TLDR: It's all such a mess, isn't it?)
> I am not sure the improved test cases are actually popular in real applications.
Given just how many these transforms are, it's not particularly relevant what can be and what can't be
encountered in the "actually popular in real applications", what mattes is what IR patterns
we can encounter in "actually popular in real application" after the middle-end opt pipeline.
> test file 1 is regression, the failing test is simplified from bit filed test bug in pr17827. Probably more prevalent in application code.
So naturally, if that new pattern is now being encountered in a testcase reduced from an actual code,
then by definition it can be encountered in "actually popular in real application", no?
> Adding restriction tend to benefit more cases.
But do we know that? Sadly all this patternmatching is done blindly, with no attempt at cost modelling,
or trying to ultimately produce least amount of instructions/etc, it just tries to combine instructions
when they can be combined.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63505/new/

https://reviews.llvm.org/D63505