[PATCH] D67677: [InstCombine] dropRedundantMaskingOfLeftShiftInput(): pat. a/b with mask (PR42563)
Sanjay Patel via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Sep 20 10:42:28 PDT 2019
spatel accepted this revision.
spatel added a comment.
This revision is now accepted and ready to land.
LGTM
================
Comment at: llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp:176-180
+ Type *ExtendedScalarTy = Type::getIntNTy(Ty->getContext(), 2 * BitWidth);
+ Type *ExtendedTy =
+ Ty->isVectorTy()
+ ? VectorType::get(ExtendedScalarTy, Ty->getVectorNumElements())
+ : ExtendedScalarTy;
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > lebedev.ri wrote:
> > > spatel wrote:
> > > > Is there a test showing that we need this ext+trunc complexity?
> > > >
> > > > See if I've botched this Alive somehow, but the simpler constant mask appears to work:
> > > > https://rise4fun.com/Alive/ArQC
> > > Hmm. The reason i've gone forward with ext/trunc is: https://rise4fun.com/Alive/o5l
> > > In your example alive does not complain because those are constants, and somehow the usual poison rules don't apply?
> > > Are we sure this isn't alive limitation, but the correct behavior?
> > I.e., i don't think i checked, what happens if `ConstantExpr::getShl()` is called with such out-of-bounds shift amount? Does it correctly handle it, or will ubsan complain, etc?
> Tried. No, we can't do this, it breaks the whole point of losslessly handling lanes that need no extra masking.
> ```
> diff --git a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
> index 3f466495c5e..8db01b4d4bd 100644
> --- a/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
> +++ b/llvm/lib/Transforms/InstCombine/InstCombineShifts.cpp
> @@ -171,21 +171,10 @@ dropRedundantMaskingOfLeftShiftInput(BinaryOperator *OuterShift,
> // But for a mask we need to get rid of old masking instruction.
> if (!Masked->hasOneUse())
> return nullptr; // Else we can't perform the fold.
> - // We should produce compute the mask in wider type, and truncate later!
> - // Get type twice as wide element-wise (same number of elements!).
> - Type *ExtendedScalarTy = Type::getIntNTy(Ty->getContext(), 2 * BitWidth);
> - Type *ExtendedTy =
> - Ty->isVectorTy()
> - ? VectorType::get(ExtendedScalarTy, Ty->getVectorNumElements())
> - : ExtendedScalarTy;
> - auto *ExtendedSumOfShAmts =
> - ConstantExpr::getZExt(SumOfShAmts, ExtendedTy);
> // And compute the mask as usual: ~(-1 << (SumOfShAmts))
> - auto *ExtendedAllOnes = ConstantExpr::getAllOnesValue(ExtendedTy);
> - auto *ExtendedInvertedMask =
> - ConstantExpr::getShl(ExtendedAllOnes, ExtendedSumOfShAmts);
> - auto *ExtendedMask = ConstantExpr::getNot(ExtendedInvertedMask);
> - NewMask = ConstantExpr::getTrunc(ExtendedMask, Ty);
> + auto *AllOnes = ConstantExpr::getAllOnesValue(Ty);
> + auto *InvertedMask = ConstantExpr::getShl(AllOnes, SumOfShAmts);
> + NewMask = ConstantExpr::getNot(InvertedMask);
> } else
> NewMask = nullptr; // No mask needed.
> // All good, we can do this fold.
> diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
> index 5445275ad1c..235e152d2fe 100644
> --- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
> +++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-a.ll
> @@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
> ; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])
> ; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])
> ; CHECK-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
> -; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
> +; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: ret <8 x i32> [[T5]]
> ;
> %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
> diff --git a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
> index 6165b579661..0a7c0e5d030 100644
> --- a/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
> +++ b/llvm/test/Transforms/InstCombine/partally-redundant-left-shift-input-masking-variant-b.ll
> @@ -82,7 +82,7 @@ define <8 x i32> @t2_vec_nonsplat(<8 x i32> %x, <8 x i32> %nbits) {
> ; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T2]])
> ; CHECK-NEXT: call void @use8xi32(<8 x i32> [[T4]])
> ; CHECK-NEXT: [[TMP1:%.*]] = shl <8 x i32> [[X:%.*]], [[T4]]
> -; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 -1, i32 -1, i32 -1, i32 undef>
> +; CHECK-NEXT: [[T5:%.*]] = and <8 x i32> [[TMP1]], <i32 undef, i32 0, i32 1, i32 2147483647, i32 undef, i32 undef, i32 undef, i32 undef>
> ; CHECK-NEXT: ret <8 x i32> [[T5]]
> ;
> %t0 = add <8 x i32> %nbits, <i32 -33, i32 -32, i32 -31, i32 -1, i32 0, i32 1, i32 31, i32 32>
>
> ```
Ok, thanks for checking. Can we make the reasoning more clear in the code comment? Something like:
// The mask must be computed in a type twice as wide to ensure
// that no bits are lost if the sum-of-shifts is wider than the base type.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D67677/new/
https://reviews.llvm.org/D67677
More information about the llvm-commits
mailing list