[PATCH] D13633: Type legalizer for masked gather/scatter intrinsics

Tue Nov 3 09:58:00 PST 2015

mbodart added inline comments.

================
Comment at: ../include/llvm/CodeGen/SelectionDAGNodes.h:2117-2119
@@ -2118,2 +2116,5 @@
            "Vector width mismatch between mask and data");
+    assert(getIndex().getValueType().getVectorNumElements() ==
+           getValueType(0).getVectorNumElements() &&
+           "Vector width mismatch between index and data");
   }
----------------
This new assertion is fine.

But why are we removing the assertion that the mask's element type is i1?

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1235
@@ +1234,3 @@
+
+SDValue DAGTypeLegalizer::PromoteIntOp_MSCATTER(MaskedScatterSDNode *N,
+                                                unsigned OpNo) {
----------------
OK

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeTypes.h:746-752
@@ -740,5 +745,9 @@
   /// input vector must have the same element type as NVT.
-  SDValue ModifyToType(SDValue InOp, EVT WidenVT);
-
+  /// When FillWithZeroes is "on" the vector will be widened with
+  /// zeroes. UseExistingVal means that the existing elements from InOp
+  /// should be used for the widening.
+  /// By default, the vector will be widened with undefined values.
+  SDValue ModifyToType(SDValue InOp, EVT NVT, bool FillWithZeroes = false,
+                       bool UseExistingVal = false);
 
   //===--------------------------------------------------------------------===//
----------------
I agree that this optimization is too X86 specific, and CodeGen should simply extend with undef values.

If you want to proceed with those X86-specific extensions, please do so in a separate change set.

That will simplify this change set, and render the other comments here moot for now.
If it turns out that replication is needed, we will want to define its behavior more crisply.

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2704-2715
@@ -2708,12 +2703,14 @@
 
-    unsigned NumConcat = WidenNumElts / MaskNumElts;
-    SmallVector<SDValue, 16> Ops(NumConcat);
-    SDValue ZeroVal = DAG.getConstant(0, dl, MaskVT);
-    Ops[0] = Mask;
-    for (unsigned i = 1; i != NumConcat; ++i)
-      Ops[i] = ZeroVal;
+  unsigned WidenNumElts = BoolVT.getVectorNumElements();
+  unsigned MaskNumElts = MaskVT.getVectorNumElements();
 
-    Mask = DAG.getNode(ISD::CONCAT_VECTORS, dl, BoolVT, Ops);
-  }
+  unsigned NumConcat = WidenNumElts / MaskNumElts;
+  SmallVector<SDValue, 16> Ops(NumConcat);
+  SDValue ZeroVal = DAG.getConstant(0, dl, MaskVT);
+  Ops[0] = Mask;
+  for (unsigned i = 1; i != NumConcat; ++i)
+    Ops[i] = ZeroVal;
+
+  Mask = DAG.getNode(ISD::CONCAT_VECTORS, dl, BoolVT, Ops);
 
   SDValue Res = DAG.getMaskedLoad(WidenVT, dl, N->getChain(), N->getBasePtr(),
----------------
That's fine.  But then can you please add a "FIX ME" comment here indicating
such a change is desired?

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2739
@@ +2738,3 @@
+                                    WidenVT.getVectorNumElements());
+  Mask = ModifyToType(Mask, WideMaskVT, true);
+
----------------
A simple source comment here to the effect "// Zero extend the mask" would help readers remember the meaning of "true".

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2754
@@ +2753,3 @@
+
+  // Legalized the chain result - switch anything that used the old chain to
+  // use the new one.
----------------
Another instance of "Legalized" => "Legalize".

================
Comment at: ../lib/Target/X86/X86ISelLowering.cpp:11732
@@ -11729,7 +11731,3 @@
 
-    if (IdxVal == OpVT.getVectorNumElements() / 2) {
-      // Zero upper bits of the Vec
-      Vec = DAG.getNode(X86ISD::VSHLI, dl, OpVT, Vec, ShiftBits);
-      Vec = DAG.getNode(X86ISD::VSRLI, dl, OpVT, Vec, ShiftBits);
-
+    if (IdxVal == NumElems / 2) {
       SDValue Vec2 = DAG.getNode(ISD::INSERT_SUBVECTOR, dl, OpVT, Undef,
----------------
Unrelated to your change, I'm finding the optimizations of MVT::i1 here confusing as there
are no checks for SubVecVT.getVectorNumElements().   The code appears to be making the
assumption that when IdxVal is 0, or half the full vector length, then SubVec is exactly half the
size of Vec.  Why is that not being checked?

It's also unclear to me how VSHLI/VSHRI operate on a vector of MVT::i1 elements.
When operating on a vector of i32 or i64 elements, I would have thought these
instructions perform a bit shift of each individual element (not cross element).
But it seems the code here is trying to shift the whole i1 vector left and right,
as if it is one big integer.  Maybe that's how VSHLI/VSHRI are defined to behave,
I don't know.  Can you clarify?

================
Comment at: ../lib/Target/X86/X86ISelLowering.cpp:19484
@@ +19483,3 @@
+                            bool UseExistingVal = false) {
+  if (InOp.isUndef())
+    return DAG.getUNDEF(NVT);
----------------
Another case where this new Undef creation should probably be moved below the InVT == NVT early return.

================
Comment at: ../test/CodeGen/X86/masked_gather_scatter.ll:170
@@ +169,3 @@
+; SKX-NEXT:    kmovw %k1, %k2
+; SKX-NEXT:    vpgatherdd (%rdi,%ymm0,4), %ymm1 {%k2}
+; SKX-NEXT:    vmovaps %zmm1, %zmm2
----------------
I don't understand how SKX can use 32-bit indices here (i.e., vpgatherdd instead of vpgatherqd), when targetting x86_64.  How does lowering of the @llvm.masked.gather.v8i32 calls know that only the low 32-bits of the <8 x i32*> pointer values are needed?
Is there some analysis of the insertelement;shufflevector/getelementptr instructions which feed to < 8 x i32*> pointers?


Repository:
  rL LLVM

http://reviews.llvm.org/D13633