[PATCH] D67677: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563)

Fri Sep 20 10:42:28 PDT 2019

spatel accepted this revision.
spatel added a comment.
This revision is now accepted and ready to land.

LGTM

================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp:176-180
+      Type *ExtendedScalarTy = Type::getIntNTy(Ty->getContext(), 2 * BitWidth);
+      Type *ExtendedTy =
+          Ty->isVectorTy()
+              ? VectorType::get(ExtendedScalarTy, Ty->getVectorNumElements())
+              : ExtendedScalarTy;
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > spatel wrote:
> > > > Is there a test showing that we need this ext+trunc complexity?
> > > > 
> > > > See if I've botched this Alive somehow, but the simpler constant mask appears to work:
> > > > https://rise4fun.com/Alive/ArQC
> > > Hmm. The reason i've gone forward with ext/trunc is: https://rise4fun.com/Alive/o5l
> > > In your example alive does not complain because those are constants, and somehow the usual poison rules don't apply?
> > > Are we sure this isn't alive limitation, but the correct behavior?
> > I.e., i don't think i checked, what happens if `ConstantExpr::getShl()` is called with such out-of-bounds shift amount? Does it correctly handle it, or will ubsan complain, etc?
> Tried. No, we can't do this, it breaks the whole point of losslessly handling lanes that need no extra masking.
> ```
> diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
> index 3f466495c5e..8db01b4d4bd 100644
> --- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
> +++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
> @@ -171,21 +171,10 @@ dropRedundantMaskingOfLeftShiftInput(BinaryOperator *OuterShift,
>        // But for a mask we need to get rid of old masking instruction.
>        if (!Masked->hasOneUse())
>          return nullptr; // Else we can't perform the fold.
> -      // We should produce compute the mask in wider type, and truncate later!
> -      // Get type twice as wide element-wise (same number of elements!).
> -      Type *ExtendedScalarTy = Type::getIntNTy(Ty->getContext(), 2 * BitWidth);
> -      Type *ExtendedTy =
> -          Ty->isVectorTy()
> -              ? VectorType::get(ExtendedScalarTy, Ty->getVectorNumElements())
> -              : ExtendedScalarTy;
> -      auto *ExtendedSumOfShAmts =
> -          ConstantExpr::getZExt(SumOfShAmts, ExtendedTy);
>        // And compute the mask as usual: ~(-1 << (SumOfShAmts))
> -      auto *ExtendedAllOnes = ConstantExpr::getAllOnesValue(ExtendedTy);
> -      auto *ExtendedInvertedMask =
> -          ConstantExpr::getShl(ExtendedAllOnes, ExtendedSumOfShAmts);
> -      auto *ExtendedMask = ConstantExpr::getNot(ExtendedInvertedMask);
> -      NewMask = ConstantExpr::getTrunc(ExtendedMask, Ty);
> +      auto *AllOnes = ConstantExpr::getAllOnesValue(Ty);
> +      auto *InvertedMask = ConstantExpr::getShl(AllOnes, SumOfShAmts);
> +      NewMask = ConstantExpr::getNot(InvertedMask);
>      } else
>        NewMask = nullptr; // No mask needed.
>      // All good, we can do this fold.
> diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
> index 5445275ad1c..235e152d2fe 100644
> --- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
> +++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
> @@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
>  ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T2]])
>  ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T4]])
>  ; CHECK-NEXT:    [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
> -; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
> +; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
>  ; CHECK-NEXT:    ret <8 x i32> [[T5]]
>  ;
>    %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
> diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
> index 6165b579661..0a7c0e5d030 100644
> --- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
> +++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
> @@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
>  ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T2]])
>  ; CHECK-NEXT:    call void @use8xi32(<8 x i32> [[T4]])
>  ; CHECK-NEXT:    [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
> -; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
> +; CHECK-NEXT:    [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
>  ; CHECK-NEXT:    ret <8 x i32> [[T5]]
>  ;
>    %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
> 
> ```
Ok, thanks for checking. Can we make the reasoning more clear in the code comment? Something like:
  // The mask must be computed in a type twice as wide to ensure
  // that no bits are lost if the sum-of-shifts is wider than the base type.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67677/new/

https://reviews.llvm.org/D67677