[PATCH] PR19958 wrong code at -O1 and above on x86_64-linux-gnu (InstCombine)

Thu Jul 3 00:07:10 PDT 2014

> On 2014 Jul 2, at 14:33, Sanjay Patel <spatel at rotateright.com> wrote:
> 
> Sorry for the delay. LGTM...but considering that versions of this patch have caused miscompiles twice and I'm pretty new here, I think we should have someone with more instcombine experience give final approval in case I've missed anything.
> 
> http://reviews.llvm.org/D4068
> 

Hi Suyog,

It seems like you might be able to go a little further without much
work.

E.g., consider the code from Sanjey's potential miscompile:

> $ cat 19958.ll > @a = common global i32 0, align 4
> define i1 @main() {
>   %a = load i32* @a, align 4
>   %shr = lshr exact i32 80, %a
>   %cmp = icmp eq i32 %shr, 41 ; NOTE: non-power-of-2 comparison
>   ret i1 %cmp
> }

Here, `%cmp` can be assigned `i1 0`.  (Or is that covered elsewhere?)

More comments inline.

> Index: lib/Transforms/InstCombine/InstCombineCompares.cpp
> ===================================================================
> --- lib/Transforms/InstCombine/InstCombineCompares.cpp
> +++ lib/Transforms/InstCombine/InstCombineCompares.cpp
> @@ -2334,6 +2334,36 @@
>    return GlobalSwapBenefits > 0;
>  }
>  
> +// Helper function to check whether Op represents a lshr/ashr exact
> +// instruction. For example:
> +// (icmp eq/ne (ashr exact const2, A), const1) -> icmp eq/ne A,
> +// Log2(const2/const1)

Please also document the lshr case, which you're handling here too.

Does this handle `const1 == 0` (i.e., divide-by-zero) correctly?  Can
you add testcases for that?

> +// Here if Op represents -> (ashr exact const2, A), and CI represents
> +// const1, we compute Quotient as const2/const1.
> +// const1 should divide const2 exactly.
> +static bool checkShrExact(Value *Op, APInt &Quotient, const ConstantInt *CI,
> +                          Value *&A) {
> +  ConstantInt *CI2;
> +  if (match(Op, m_AShr(m_ConstantInt(CI2), m_Value(A))) &&
> +      (cast<BinaryOperator>(Op)->isExact())) {
> +    if (CI2->getValue().srem(CI->getValue()) == 0) {
> +      Quotient = CI2->getValue().sdiv(CI->getValue());

Can you use APInt::sdivrem here to save work?

> +      return true;
> +    }
> +  }
> +
> +  // Handle the case for lshr.
> +  if (match(Op, m_LShr(m_ConstantInt(CI2), m_Value(A))) &&
> +      (cast<BinaryOperator>(Op)->isExact())) {
> +    if (CI2->getValue().urem(CI->getValue()) == 0) {
> +      Quotient = CI2->getValue().udiv(CI->getValue());

Can you use APInt::udivrem here to save work?

> +      return true;
> +    }
> +  }
> +
> +  return false;
> +}
> +
>  Instruction *InstCombiner::visitICmpInst(ICmpInst &I) {
>    bool Changed = false;
>    Value *Op0 = I.getOperand(0), *Op1 = I.getOperand(1);
> @@ -2455,6 +2485,21 @@
>        return new ICmpInst(I.getPredicate(), A, B);
>      }
>  
> +    // PR19753:
> +    // (icmp eq/ne (ashr exact const2, A), const1) -> icmp eq/ne A,
> +    // Log2(const2/const1).

Please also document the lshr case, which you're handling here too.

> +    // Similar for lshr.
> +    // TODO : Handle this for other icmp instructions.
> +    {
> +      APInt Quotient;
> +      if (I.isEquality() && checkShrExact(Op0, Quotient, CI, A)) {

Rather than introducing a block just for scope, I'd separate the
`isEquality()` check like this:

    if (I.isEquality()) {
      APInt Quotient;
      if (checkShrExact(Op0, Quotient, CI, A)) {

> +        int shift = Quotient.exactLogBase2();

This should be `Shift` for coding style.

> +        if (shift != -1)
> +          return new ICmpInst(I.getPredicate(), A,
> +                              ConstantInt::get(A->getType(), shift));
> +      }
> +    }
> +
>      // If we have an icmp le or icmp ge instruction, turn it into the
>      // appropriate icmp lt or icmp gt instruction.  This allows us to rely on
>      // them being folded in the code below.  The SimplifyICmpInst code has
> Index: test/Transforms/InstCombine/icmp.ll
> ===================================================================
> --- test/Transforms/InstCombine/icmp.ll
> +++ test/Transforms/InstCombine/icmp.ll
> @@ -1424,3 +1424,80 @@
>    %2 = icmp slt i32 %1, -10
>    ret i1 %2
>  }
> +
> +; CHECK-LABEL: @exact_ashr_eq
> +; CHECK-NEXT: icmp eq i32 %a, 31
> +define i1 @exact_ashr_eq(i32 %a) {
> +  %shr = ashr exact i32 -2147483648, %a
> +  %cmp = icmp eq i32 %shr, -1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_ne
> +; CHECK-NEXT: icmp ne i32 %a, 31
> +define i1 @exact_ashr_ne(i32 %a) {
> +  %shr = ashr exact i32 -2147483648, %a
> +  %cmp = icmp ne i32 %shr, -1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_eq
> +; CHECK-NEXT: icmp eq i32 %a, 30
> +define i1 @exact_lshr_eq(i32 %a) {
> +  %shr = lshr exact i32 1073741824, %a
> +  %cmp = icmp eq i32 %shr, 1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_ne
> +; CHECK-NEXT: icmp ne i32 %a, 30
> +define i1 @exact_lshr_ne(i32 %a) {
> +  %shr = lshr exact i32 1073741824, %a
> +  %cmp = icmp ne i32 %shr, 1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_ne_noexactdiv
> +; CHECK-NEXT: lshr exact i32 80, %a
> +; CHECK-NEXT: icmp ne i32 %shr, 31
> +define i1 @exact_lshr_ne_noexactdiv(i32 %a) {
> +  %shr = lshr exact i32 80, %a
> +  %cmp = icmp ne i32 %shr, 31
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_noexact
> +; CHECK-NEXT: ashr i32 -2147483648, %a
> +; CHECK-NEXT: icmp ne i32 %shr, -1
> +define i1 @exact_ashr_noexact(i32 %a) {
> +  %shr = ashr i32 -2147483648, %a
> +  %cmp = icmp ne i32 %shr, -1
> +  ret i1 %cmp 
> +}
> +
> +; CHECK-LABEL: @exact_lshr_noexact
> +; CHECK-NEXT: lshr i32 1073741824, %a
> +; CHECK-NEXT: icmp ne i32 %shr, 1
> +define i1 @exact_lshr_noexact(i32 %a) {
> +  %shr = lshr i32 1073741824, %a
> +  %cmp = icmp ne i32 %shr, 1
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_ashr_no_neeq
> +; CHECK-NEXT: ashr exact i32 -30, %a
> +; CHECK-NEXT: icmp ult i32 %shr, -15
> +define i1 @exact_ashr_no_neeq(i32 %a) {
> +  %shr = ashr exact i32 -30, %a
> +  %cmp = icmp ult i32 %shr, -15
> +  ret i1 %cmp
> +}
> +
> +; CHECK-LABEL: @exact_lshr_no_neeq
> +; CHECK-NEXT: lshr exact i32 1073741824, %a
> +; CHECK-NEXT: icmp ugt i32 %shr, 1
> +define i1 @exact_lshr_no_neeq(i32 %a) {
> +  %shr = lshr exact i32 1073741824, %a
> +  %cmp = icmp ugt i32 %shr, 1
> +  ret i1 %cmp
> +}