[PATCH] PR19958 wrong code at -O1 and above on x86_64-linux-gnu (InstCombine)

Thu Jul 10 16:58:25 PDT 2014

> On 2014-Jul-10, at 02:06, suyog sarda <sardask01 at gmail.com> wrote:
> 
> Adding Duncan 
> 
> (Something wrong with phabricator, previous mail didn't include Duncan even after adding him as reviewer.)
> 
> ---------- Forwarded message ----------
> From: suyog <suyog.sarda at samsung.com>
> Date: Thu, Jul 10, 2014 at 1:55 AM
> Subject: Re: [PATCH] PR19958 wrong code at -O1 and above on x86_64-linux-gnu (InstCombine)
> To: suyog.sarda at samsung.com, rafael.espindola at gmail.com, nlewycky at google.com, spatel at rotateright.com
> Cc: llvm-commits at cs.uiuc.edu
> 
> 
> Hi Duncan,
> 
> Thanks for your explanation. I have handled case for 'lshr' separately now in the patch attached.
> 
> I have combined the logic for ashr/lshr as well as exact/non-exact as most of the logic is same.
> Wherever there are special cases, i have handled them separately. I have added the logic for the
> cases which were handled by 'instsimplify' to keep the logical flow as discussed.
> 
> I have combined the logic, since the 'match' functions are expensive, same for the log,
> and our whole point of calculating difference of log was to avoid expensive division operation.
> 
> Please help in reviewing the patch. Your comments/suggestions are most welcomed :)
> 
> http://reviews.llvm.org/D4068
> 
> Files:
>   lib/Transforms/InstCombine/InstCombineCompares.cpp
>   test/Transforms/InstCombine/icmp.ll

This is much easier to reason about and review.  I've found another few
potential miscompiles, noted inline below.  A few points up front:

 1. Some of your comments spell out exactly what the following line of
    code does -- they don't add any value, since the code is
    self-explanatory.  I've pointed them out below, and suggested better
    comments where I think they're warranted.

 2. You return (e.g.) `getTrue()` whether the `icmp` operator is `eq` or
    `ne`.  This is wrong.

      - Please add testcases for all the 0 cases using `ne`.  In fact, I
        think every testcase you have for `eq` should have an `ne` dual
        (and vice versa).

      - You need to fix the code as well (add the testcases first, and
        then get them passing by fixing the code).  

        I think it'll be clearest to create lambda functions like this:

            auto getConstant = [&I](bool IsTrue) {
              if (I.getPredicate() == I.ICMP_NE)
                IsTrue = !IsTrue;
              return ReplaceInstUsesWith(I, ConstantInt::get(
                  I.getType(), IsTrue));
            };
            auto getICmp = [&I](CmpInst::Predicate Pred, Value *LHS,
                                Value *RHS) {
              if (I.getPredicate() == I.ICMP_NE)
                Pred = CmpInst::getInversePredicate(Pred);
              return new ICmpInst(Pred, LHS, RHS);
            };

        Then you can use `getConstant()` and `getICmp()` as if you're
        only dealing with `eq`, and leave them to give inverted results
        for `ne`.

 3. I've actually reviewed the tests this time; note the comments I
    made there.

> Index: lib/Transforms/InstCombine/InstCombineCompares.cpp
> ===================================================================
> --- lib/Transforms/InstCombine/InstCombineCompares.cpp
> +++ lib/Transforms/InstCombine/InstCombineCompares.cpp
> @@ -2459,6 +2459,91 @@
>        return new ICmpInst(I.getPredicate(), A, B);
>      }
>  
> +    // PR19753:
> +    // (icmp eq/ne (ashr/lshr const2, A), const1) ->
> +    // (icmp eq/ne A, Log2(const2/const1)) ->
> +    // (icmp eq/ne A, Log2(const2) - Log2(const1)).
> +    // TODO : Handle this for other icmp instructions.
> +    if (I.isEquality()) {
> +      ConstantInt *CI2;
> +      if (match(Op0, m_AShr(m_ConstantInt(CI2), m_Value(A))) ||
> +          match(Op0, m_LShr(m_ConstantInt(CI2), m_Value(A)))) {
> +        APInt AP1 = CI->getValue();
> +        APInt AP2 = CI2->getValue();
> +        int Shift;
> +        if (!AP1) {
> +          // if const1 = const2 = 0, return true.

Not a useful comment.

> +          if (!AP2)

Instead, add a comment here:

    // Both constants are 0.

> +            return ReplaceInstUsesWith(I, ConstantInt::getTrue(I.getType()));
> +          // if const1 = 0, const2 !=0 and right shift is 'exact',
> +          // return false.

I don't think this comment is needed at all.

> +          if (cast<BinaryOperator>(Op0)->isExact())
> +            return ReplaceInstUsesWith(I, ConstantInt::getFalse(I.getType()));
> +          // if const1 = 0, right shift is non-exact and const2 is -ve,
> +          // return false.

Not a useful comment.

> +          if (AP2.isNegative())

This is correct, but it's not obvious why it's correct for `lshr`.

When the MSB is set, it seems at first like these (e.g.) would be the
correct transformations:

    (icmp eq (lshr i1   1, A), 0) -> (icmp ugt i1 A, 0)
    (icmp eq (lshr i2   2, A), 0) -> (icmp ugt i2 A, 1)
    (icmp eq (lshr i2   3, A), 0) -> (icmp ugt i2 A, 1)
    (icmp eq (lshr i8 255, A), 0) -> (icmp ugt i8 A, 7)

However, there's a subtlety here that makes your code correct:  for the
result to be `i1 1`, the value of `A` must be at least the number of
bits in the type, so the correct result is always `i1 0` or `i1 undef`
(never `i1 1`).

This deserves a clear comment, *inside* the `if`.  Something like this
seems about right:

    // MSB is set, so a lshr with a large enough A would be undefined.

> +            return ReplaceInstUsesWith(I, ConstantInt::getFalse(I.getType()));
> +          // if const1 = 0, shift is non-exact and const2 is +ve ->
> +          // icmp ugt A, Log2(const2).

Comment isn't useful.  I prefer:

    // A must be large enough to shift out the MSB.

> +          Shift = AP2.logBase2();
> +          return new ICmpInst(I.ICMP_UGT, A,
> +                              ConstantInt::get(A->getType(), Shift));
> +        }
> +
> +        // Shifting 0 by any value gives 0.

I think this comment would be better inside the `if`.

> +        if (!AP2)
> +          return ReplaceInstUsesWith(I, ConstantInt::getFalse(I.getType()));
> +
> +        // if const1 = const2 -> icmp eq/ne A, 0
> +        if (AP1 == AP2)

I think this isn't strictly correct for non-exact `ashr` of -1.  Here's
the correct logic (note that I'm using the lambda helpers I suggested).

    if (AP1 == AP2) {
      if (AP1.isAllOnesValue() && isa<AShrOperator>(Op))
        return getConstant(true);

      return getICmp(I.getPredicate(), A,
                     ConstantInt::getNullValue(A->getType()));
    }

Please add a testcase for -1 (all four combinations of `eq`/`ne` and
with or without `exact`).

> +          return new ICmpInst(I.getPredicate(), A,
> +                              ConstantInt::getNullValue(A->getType()));
> +
> +        if (isa<AShrOperator>(Op0)) {

Since there's will be some logic above that checks the same `isa`, you
should probably add a `bool IsAShr` variable above and reuse it here.

> +          // const1 and const2 are sign opposite, arithmatic shift will retain
> +          // sign and hence const2 will never equal const1.

This comment is useful, but the grammar is a little off and it's pretty
verbose.

> +          if (AP1.isNegative() != AP2.isNegative())

Moving the comment here makes it clearer (and shorter):

    // Arithmetic shift will never change the sign.

> +            return ReplaceInstUsesWith(I, ConstantInt::getFalse(I.getType()));
> +          // Both the constants are negative, take their positive to calculate
> +          // log.
> +          if (AP1.isNegative()) {
> +            AP1 = -AP1;
> +            AP2 = -AP2;
> +          }
> +        }
> +
> +        if (isa<LShrOperator>(Op0)) {

This entire block is unnecessary, since logical shifts have no concept
of negative numbers.  The logic that follows is correct without it.

In its current state, it's actively harmful (would cause miscompiles).

> +          // logical shift of const2 will be +ve which will not be equal
> +          // to -ve const1.
> +          if (AP1.isNegative())
> +            return ReplaceInstUsesWith(I, ConstantInt::getFalse(I.getType()));
> +          // const1 is +ve and const2 is -ve, logical shift of const2 may
> +          // equal const1, hence take abs(AP2) to calculate log.
> +          if (AP2.isNegative())
> +            AP2 = -AP2;

This is incorrect and would cause miscompiles.

Please add testcases for:

    (icmp eq (lshr i3 6, A), 3)) -> (icmp eq A, 1)
    (icmp ne (lshr i3 6, A), 3)) -> (icmp ne A, 1)
    (icmp eq (lshr exact i3 6, A), 3)) -> (icmp eq A, 1)
    (icmp ne (lshr exact i3 6, A), 3)) -> (icmp ne A, 1)

Your code will convert the `6` to its two's-complement, which is `2`.
It should be left as a `6`.

> +        }
> +
> +        // Now we have const1 = abs(const1) and const2 = abs(const2).
> +        // If const1 is greater than const2,
> +        // right shifting const2 will never equal to const1.

I'd remove this comment...

> +        if (AP1.ugt(AP2))

... and add a shorter one here:

    // Right-shifting will not increase the value.

> +          return ReplaceInstUsesWith(I, ConstantInt::getFalse(I.getType()));
> +

A comment here might be useful.

    // Get the distance between the highest bit that's set.

> +        Shift = AP2.logBase2() - AP1.logBase2();
> +
> +        // if shift is 'exact', then use shl to check equality.
> +        if ((cast<BinaryOperator>(Op0)->isExact()) && (AP1.shl(Shift) == AP2))
> +          return new ICmpInst(I.getPredicate(), A,
> +                              ConstantInt::get(A->getType(), Shift));

I think I suggested it originally, but this block is unnecessary and no
simpler than what follows.  If shift is `exact`, then it only differs
when the result is `undef`, so it's okay to treat it the same as
non-`exact`.

> +        // Use lshr here, since we've canonicalized to +ve numbers.
> +        if (AP1 == AP2.lshr(Shift))
> +          return new ICmpInst(I.getPredicate(), A,
> +                              ConstantInt::get(A->getType(), Shift));
> +        // Shifting const2 will never be equal to const1, return false.

Change this comment to:

    // Shifting const2 will never be equal to const1.

> +        return ReplaceInstUsesWith(I, ConstantInt::getFalse(I.getType()));
> +      }
> +    }
> +
>      // If we have an icmp le or icmp ge instruction, turn it into the
>      // appropriate icmp lt or icmp gt instruction.  This allows us to rely on
>      // them being folded in the code below.  The SimplifyICmpInst code has
> Index: test/Transforms/InstCombine/icmp.ll
> ===================================================================
> --- test/Transforms/InstCombine/icmp.ll
> +++ test/Transforms/InstCombine/icmp.ll
> @@ -1424,3 +1424,205 @@
>    %2 = icmp slt i32 %1, -10
>    ret i1 %2
>  }
> +

About the tests generally -- I think it would be easier to confirm their
correctness by inspection if you used `i8` values (or smaller) instead
of `i32`.

Frankly, I think they should be a separate file (something like
`icmp-shr.ll`), but I don't have a strong opinion.

> +; CHECK-LABEL: @exact_lshr_both_zero
> +; CHECK-NEXT: ret i1 true
> +define i1 @exact_lshr_both_zero(i32 %a) {
> +  %shr = lshr exact i32 0, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_both_zero
> +; CHECK-NEXT: ret i1 true
> +define i1 @exact_ashr_both_zero(i32 %a) {
> +  %shr = ashr exact i32 0, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @nonexact_lshr_both_zero
> +; CHECK-NEXT: ret i1 true
> +define i1 @nonexact_lshr_both_zero(i32 %a) {
> +  %shr = lshr i32 0, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @nonexact_ashr_both_zero
> +; CHECK-NEXT: ret i1 true
> +define i1 @nonexact_ashr_both_zero(i32 %a) {
> +  %shr = ashr i32 0, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_last_zero
> +; CHECK-NEXT: ret i1 false
> +define i1 @exact_lshr_last_zero(i32 %a) {
> +  %shr = lshr exact i32 64, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_last_zero
> +; CHECK-NEXT: ret i1 false
> +define i1 @exact_ashr_last_zero(i32 %a) {
> +  %shr = ashr exact i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @nonexact_lshr_last_zero
> +; CHECK-NEXT: ret i1 false
> +define i1 @nonexact_lshr_last_zero(i32 %a) {
> +  %shr = lshr i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @nonexact_ashr_last_zero
> +; CHECK-NEXT: ret i1 false
> +define i1 @nonexact_ashr_last_zero(i32 %a) {
> +  %shr = ashr i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @lshr_first_positive_last_zero
> +; CHECK-NEXT: icmp ugt i32 %a, 30
> +define i1 @lshr_first_positive_last_zero(i32 %a) {

For `lshr` testcases, don't talk about +ve and -ve.  Since the semantics
are for unsigned numbers, this language is confusing.

I recommend `msb_high` (instead of negative) and `msb_low` (instead of
positive).

> +  %shr = lshr i32 1073741824, %a
> +  %cmp = icmp eq i32 %shr, 0
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @ashr_first_positive_second_zero
> +; CHECK-NEXT: icmp ugt i32 %a, 30
> +define i1 @ashr_first_positive_second_zero(i32 %a) {
> + %shr = ashr i32 1073741824, %a
> + %cmp = icmp eq i32 %shr, 0
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @lshr_first_zero
> +; CHECK-NEXT: ret i1 false
> +define i1 @lshr_first_zero(i32 %a) {
> +  %shr = lshr i32 0, %a
> +  %cmp = icmp eq i32 %shr, 2 
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_both_equal
> +; CHECK-NEXT: icmp ne i32 %a, 0
> +define i1 @exact_ashr_both_equal(i32 %a) {
> + %shr = ashr exact i32 -2147483648, %a
> + %cmp = icmp ne i32 %shr, -2147483648
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_both_equal
> +; CHECK-NEXT: icmp eq i32 %a, 0
> +define i1 @exact_lshr_both_equal(i32 %a) {
> + %shr = lshr exact i32 1073741824, %a
> + %cmp = icmp eq i32 %shr, 1073741824
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_opposite_sign
> +; CHECK-NEXT: ret i1 false
> +define i1 @exact_ashr_opposite_sign(i32 %a) {
> +  %shr = ashr exact i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, 1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @ashr_opposite_sign
> +; CHECK-NEXT: ret i1 false
> +define i1 @ashr_opposite_sign(i32 %a) {
> +  %shr = ashr i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, 1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_opposite_sign
> +; CHECK-NEXT: icmp eq i32 %a, 31
> +define i1 @exact_lshr_opposite_sign(i32 %a) {

Also don't talk about opposite sign for `lshr`.  Unsigned numbers don't
have signs.  Maybe `opposite_msb` would work?

> +  %shr = lshr exact i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, 1 
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @lshr_opposite_sign
> +; CHECK-NEXT: icmp eq i32 %a, 31
> +define i1 @lshr_opposite_sign(i32 %a) {
> +  %shr = lshr i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, 1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_eq
> +; CHECK-NEXT: icmp eq i32 %a, 31
> +define i1 @exact_ashr_eq(i32 %a) {
> + %shr = ashr exact i32 -2147483648, %a
> + %cmp = icmp eq i32 %shr, -1
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_ne
> +; CHECK-NEXT: icmp ne i32 %a, 31
> +define i1 @exact_ashr_ne(i32 %a) {
> + %shr = ashr exact i32 -2147483648, %a
> + %cmp = icmp ne i32 %shr, -1
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_eq
> +; CHECK-NEXT: icmp eq i32 %a, 30
> +define i1 @exact_lshr_eq(i32 %a) {
> + %shr = lshr exact i32 1073741824, %a
> + %cmp = icmp eq i32 %shr, 1
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_ne
> +; CHECK-NEXT: icmp ne i32 %a, 30
> +define i1 @exact_lshr_ne(i32 %a) {
> + %shr = lshr exact i32 1073741824, %a
> + %cmp = icmp ne i32 %shr, 1
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_ne_noexactdiv
> +; CHECK-NEXT: ret i1 false
> +define i1 @exact_lshr_ne_noexactdiv(i32 %a) {
> + %shr = lshr exact i32 80, %a
> + %cmp = icmp ne i32 %shr, 31
> + ret i1 %cmp
> +}

Testcases like 80 => 5 would be useful.

> +
> +; CHECK-LABEL: @exact_ashr_ne_noexactdiv
> +; CHECK-NEXT: ret i1 false
> +define i1 @exact_ashr_ne_noexactdiv(i32 %a) {
> + %shr = ashr exact i32 -80, %a
> + %cmp = icmp ne i32 %shr, -41
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_no_neeq
> +; CHECK-NEXT: ashr exact i32 -30, %a
> +; CHECK-NEXT: icmp ult i32 %shr, -15
> +define i1 @exact_ashr_no_neeq(i32 %a) {
> + %shr = ashr exact i32 -30, %a
> + %cmp = icmp ult i32 %shr, -15
> + ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_no_neeq
> +; CHECK-NEXT: lshr exact i32 1073741824, %a
> +; CHECK-NEXT: icmp ugt i32 %shr, 1
> +define i1 @exact_lshr_no_neeq(i32 %a) {
> + %shr = lshr exact i32 1073741824, %a
> + %cmp = icmp ugt i32 %shr, 1
> + ret i1 %cmp
> +}
>