[PATCH] D19661: [X86] Also try to zero elts when lowering v32i8 shuffle with PSHUFB.
Sanjay Patel via llvm-commits
llvm-commits at lists.llvm.org
Thu May 5 14:13:53 PDT 2016
spatel added inline comments.
================
Comment at: lib/Target/X86/X86ISelLowering.cpp:11427-11445
@@ -11426,9 +11426,21 @@
- if (isSingleInputShuffleMask(Mask)) {
- // There are no generalized cross-lane shuffle operations available on i8
- // element types.
- if (is128BitLaneCrossingShuffleMask(MVT::v32i8, Mask))
- return lowerVectorShuffleAsLanePermuteAndBlend(DL, MVT::v32i8, V1, V2,
- Mask, DAG);
+ SmallBitVector Zeroable = computeZeroableShuffleElements(Mask, V1, V2);
+ bool SingleInputMask = true;
+ bool SingleInputAndZeroesMask = true;
+ for (int i = 0, Size = Mask.size(); i < Size; ++i) {
+ if (Mask[i] >= Size) {
+ SingleInputMask = false;
+ if (!Zeroable[i]) {
+ SingleInputAndZeroesMask = false;
+ break;
+ }
+ }
+ }
+ // There are no generalized cross-lane shuffle operations available on i8
+ // element types.
+ if (SingleInputMask && is128BitLaneCrossingShuffleMask(MVT::v32i8, Mask))
+ return lowerVectorShuffleAsLanePermuteAndBlend(DL, MVT::v32i8, V1, V2, Mask,
+ DAG);
+ if (SingleInputAndZeroesMask) {
SDValue PSHUFBMask[32];
----------------
ab wrote:
> I wanted to do that at first, but you need to look at the operands, and at that point you basically duplicated computeZeroableShuffleElements.
>
> An alternative I considered was to do some kind of computeZeroableShuffleMask(Mask, V1, V2, NewMask), which returns a mask with SM_SentinelZero and SM_SentinelUndef. Then, users can check that it isSingleInputShuffleMask and do their thing. WDYT?
That would be good - slow down the code explosion for x86 vector lowering.
But I don't think we have to hold up this patch for that; a TODO comment should be ok.
LGTM, but I'll let Simon give the final word because he has a much better understanding of everything going on with shuffles.
http://reviews.llvm.org/D19661
More information about the llvm-commits
mailing list