[llvm] r305053 - [LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.

Thu Jun 8 21:06:12 PDT 2017

Forgot to mention it in the commit message, but FWIW: I hacked
together some really sketchy asserts that would catch this, and put
them in copyIRFlags. I then did a stage2 build of clang with the
sketchy-assert build as the compiler.

The build had a few errors, but the vast majority of files
successfully compiled. I got 0 assertion errors, so it doesn't
immediately look like we have a similar bug (that goes through copyIRFlags,
at least) anywhere else.

On Thu, Jun 8, 2017 at 8:56 PM, George Burgess IV via llvm-commits
<llvm-commits at lists.llvm.org> wrote:
> Author: gbiv
> Date: Thu Jun  8 22:56:15 2017
> New Revision: 305053
>
> URL: http://llvm.org/viewvc/llvm-project?rev=305053&view=rev
> Log:
> [LoopVectorize] Don't preserve nsw/nuw flags on shrunken ops.
>
> If we're shrinking a binary operation, it may be the case that the new
> operations wraps where the old didn't. If this happens, the behavior
> should be well-defined. So, we can't always carry wrapping flags with us
> when we shrink operations.
>
> If we do, we get incorrect optimizations in cases like:
>
> void foo(const unsigned char *from, unsigned char *to, int n) {
>   for (int i = 0; i < n; i++)
>     to[i] = from[i] - 128;
> }
>
> which gets optimized to:
>
> void foo(const unsigned char *from, unsigned char *to, int n) {
>   for (int i = 0; i < n; i++)
>     to[i] = from[i] | 128;
> }
>
> Because:
> - InstCombine turned `sub i32 %from.i, 128` into
>   `add nuw nsw i32 %from.i, 128`.
> - LoopVectorize vectorized the add to be `add nuw nsw <16 x i8>` with a
>   vector full of `i8 128`s
> - InstCombine took advantage of the fact that the newly-shrunken add
>   "couldn't wrap", and changed the `add` to an `or`.
>
> InstCombine seems happy to figure out whether we can add nuw/nsw on its
> own, so I just decided to drop the flags. There are already a number of
> places in LoopVectorize where we rely on InstCombine to clean up.
>
> Modified:
>     llvm/trunk/include/llvm/IR/Instruction.h
>     llvm/trunk/lib/IR/Instruction.cpp
>     llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
>     llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
>
> Modified: llvm/trunk/include/llvm/IR/Instruction.h
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/Instruction.h?rev=305053&r1=305052&r2=305053&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/IR/Instruction.h (original)
> +++ llvm/trunk/include/llvm/IR/Instruction.h Thu Jun  8 22:56:15 2017
> @@ -360,9 +360,9 @@ public:
>    /// Copy I's fast-math flags
>    void copyFastMathFlags(const Instruction *I);
>
> -  /// Convenience method to copy supported wrapping, exact, and fast-math flags
> -  /// from V to this instruction.
> -  void copyIRFlags(const Value *V);
> +  /// Convenience method to copy supported exact, fast-math, and (optionally)
> +  /// wrapping flags from V to this instruction.
> +  void copyIRFlags(const Value *V, bool IncludeWrapFlags = true);
>
>    /// Logical 'and' of any supported wrapping, exact, and fast-math flags of
>    /// V and this instruction.
>
> Modified: llvm/trunk/lib/IR/Instruction.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/IR/Instruction.cpp?rev=305053&r1=305052&r2=305053&view=diff
> ==============================================================================
> --- llvm/trunk/lib/IR/Instruction.cpp (original)
> +++ llvm/trunk/lib/IR/Instruction.cpp Thu Jun  8 22:56:15 2017
> @@ -216,10 +216,10 @@ void Instruction::copyFastMathFlags(cons
>    copyFastMathFlags(I->getFastMathFlags());
>  }
>
> -void Instruction::copyIRFlags(const Value *V) {
> +void Instruction::copyIRFlags(const Value *V, bool IncludeWrapFlags) {
>    // Copy the wrapping flags.
> -  if (auto *OB = dyn_cast<OverflowingBinaryOperator>(V)) {
> -    if (isa<OverflowingBinaryOperator>(this)) {
> +  if (IncludeWrapFlags && isa<OverflowingBinaryOperator>(this)) {
> +    if (auto *OB = dyn_cast<OverflowingBinaryOperator>(V)) {
>        setHasNoSignedWrap(OB->hasNoSignedWrap());
>        setHasNoUnsignedWrap(OB->hasNoUnsignedWrap());
>      }
>
> Modified: llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp?rev=305053&r1=305052&r2=305053&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp (original)
> +++ llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp Thu Jun  8 22:56:15 2017
> @@ -3814,7 +3814,11 @@ void InnerLoopVectorizer::truncateToMini
>        if (auto *BO = dyn_cast<BinaryOperator>(I)) {
>          NewI = B.CreateBinOp(BO->getOpcode(), ShrinkOperand(BO->getOperand(0)),
>                               ShrinkOperand(BO->getOperand(1)));
> -        cast<BinaryOperator>(NewI)->copyIRFlags(I);
> +
> +        // Any wrapping introduced by shrinking this operation shouldn't be
> +        // considered undefined behavior. So, we can't unconditionally copy
> +        // arithmetic wrapping flags to NewI.
> +        cast<BinaryOperator>(NewI)->copyIRFlags(I, /*IncludeWrapFlags=*/false);
>        } else if (auto *CI = dyn_cast<ICmpInst>(I)) {
>          NewI =
>              B.CreateICmp(CI->getPredicate(), ShrinkOperand(CI->getOperand(0)),
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll?rev=305053&r1=305052&r2=305053&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll Thu Jun  8 22:56:15 2017
> @@ -5,7 +5,7 @@ target triple = "aarch64"
>
>  ; CHECK-LABEL: @add_a(
>  ; CHECK: load <16 x i8>, <16 x i8>*
> -; CHECK: add nuw nsw <16 x i8>
> +; CHECK: add <16 x i8>
>  ; CHECK: store <16 x i8>
>  ; Function Attrs: nounwind
>  define void @add_a(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i32 %len) #0 {
> @@ -31,9 +31,37 @@ for.body:
>    br i1 %exitcond, label %for.cond.cleanup, label %for.body
>  }
>
> +; Ensure that we preserve nuw/nsw if we're not shrinking the values we're
> +; working with.
> +; CHECK-LABEL: @add_a1(
> +; CHECK: load <16 x i8>, <16 x i8>*
> +; CHECK: add nuw nsw <16 x i8>
> +; CHECK: store <16 x i8>
> +; Function Attrs: nounwind
> +define void @add_a1(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i32 %len) #0 {
> +entry:
> +  %cmp8 = icmp sgt i32 %len, 0
> +  br i1 %cmp8, label %for.body, label %for.cond.cleanup
> +
> +for.cond.cleanup:                                 ; preds = %for.body, %entry
> +  ret void
> +
> +for.body:                                         ; preds = %entry, %for.body
> +  %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
> +  %arrayidx = getelementptr inbounds i8, i8* %p, i64 %indvars.iv
> +  %0 = load i8, i8* %arrayidx
> +  %add = add nuw nsw i8 %0, 2
> +  %arrayidx3 = getelementptr inbounds i8, i8* %q, i64 %indvars.iv
> +  store i8 %add, i8* %arrayidx3
> +  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
> +  %lftr.wideiv = trunc i64 %indvars.iv.next to i32
> +  %exitcond = icmp eq i32 %lftr.wideiv, %len
> +  br i1 %exitcond, label %for.cond.cleanup, label %for.body
> +}
> +
>  ; CHECK-LABEL: @add_b(
>  ; CHECK: load <8 x i16>, <8 x i16>*
> -; CHECK: add nuw nsw <8 x i16>
> +; CHECK: add <8 x i16>
>  ; CHECK: store <8 x i16>
>  ; Function Attrs: nounwind
>  define void @add_b(i16* noalias nocapture readonly %p, i16* noalias nocapture %q, i32 %len) #0 {
> @@ -61,7 +89,7 @@ for.body:
>
>  ; CHECK-LABEL: @add_c(
>  ; CHECK: load <8 x i8>, <8 x i8>*
> -; CHECK: add nuw nsw <8 x i16>
> +; CHECK: add <8 x i16>
>  ; CHECK: store <8 x i16>
>  ; Function Attrs: nounwind
>  define void @add_c(i8* noalias nocapture readonly %p, i16* noalias nocapture %q, i32 %len) #0 {
> @@ -116,12 +144,12 @@ for.body:
>  ; CHECK-LABEL: @add_e(
>  ; CHECK: load <16 x i8>
>  ; CHECK: shl <16 x i8>
> -; CHECK: add nuw nsw <16 x i8>
> +; CHECK: add <16 x i8>
>  ; CHECK: or <16 x i8>
> -; CHECK: mul nuw nsw <16 x i8>
> +; CHECK: mul <16 x i8>
>  ; CHECK: and <16 x i8>
>  ; CHECK: xor <16 x i8>
> -; CHECK: mul nuw nsw <16 x i8>
> +; CHECK: mul <16 x i8>
>  ; CHECK: store <16 x i8>
>  define void @add_e(i8* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 %arg1, i8 %arg2, i32 %len) #0 {
>  entry:
> @@ -162,12 +190,12 @@ for.body:
>  ; CHECK: load <8 x i16>
>  ; CHECK: trunc <8 x i16>
>  ; CHECK: shl <8 x i8>
> -; CHECK: add nsw <8 x i8>
> +; CHECK: add <8 x i8>
>  ; CHECK: or <8 x i8>
> -; CHECK: mul nuw nsw <8 x i8>
> +; CHECK: mul <8 x i8>
>  ; CHECK: and <8 x i8>
>  ; CHECK: xor <8 x i8>
> -; CHECK: mul nuw nsw <8 x i8>
> +; CHECK: mul <8 x i8>
>  ; CHECK: store <8 x i8>
>  define void @add_f(i16* noalias nocapture readonly %p, i8* noalias nocapture %q, i8 %arg1, i8 %arg2, i32 %len) #0 {
>  entry:
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits