[PATCH] D13633: Type legalizer for masked gather/scatter intrinsics

Mon Nov 2 05:07:37 PST 2015

delena marked 3 inline comments as done.

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp:1235
@@ +1234,3 @@
+
+SDValue DAGTypeLegalizer::PromoteIntOp_MSCATTER(MaskedScatterSDNode *N,
+                                                unsigned OpNo) {
----------------
mbodart wrote:
> This is the first time I've looked at vector type legalization, so there's something I don't quite understand.  A masked store (PromoteIntOp_MSTORE) is in some sense just a degenerative case of a masked scatter.  So intuitively it would seem PromoteIntOp_MSTORE would have a more simple implementation than PromoteIntOp_MSCATTER.  But here it seems the reverse is true.  Why does PromoteIntOp_MSTORE try to legalize both the mask and data simultaneously, while for scatter we just do the one operand?  Could PromoteIntOp_MSTORE be simplified?  If not, what is the essential difference?
I wanted to simplify MLOAD / MSTORE but it requires additional changes in X86ISelLowering.cpp. I can do this in a separate patch.

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeTypes.h:746-751
@@ -740,4 +745,8 @@
   /// input vector must have the same element type as NVT.
-  SDValue ModifyToType(SDValue InOp, EVT WidenVT);
-
+  /// When FillWithZeroes is "on" the vector will be widened with
+  /// zeroes. UseExistingVal means that the existing elements from InOp
+  /// should be used for the widening.
+  /// By defalut, the vector will be widened with undefined values.
+  SDValue ModifyToType(SDValue InOp, EVT WidenVT, bool FillWithZeroes = false,
+                       bool UseExistingVal = false);
 
----------------
mbodart wrote:
> The implementation of UseExistingVal gives it an ambiguous meaning.  If a widening operation is an even multiple of the original size, then the full original operand is replicated to fill out the new value.  Otherwise, only its first element is replicated.  Was that the intent?  If so, it seems like a difficult interface to use robustly.
> 
> An enum with possible values of Zext, Splat or Undef would make the widening interface
> more clear, assuming the Splat ambiguity is resolved to consistently replicate all original values.
> 
> But as a further question, when would a larger vector length not be a multiple of a smaller one?
> Aren't they all powers of 2?
> 
> The implementation of UseExistingVal gives it an ambiguous meaning. If a widening operation is an even multiple of the original size, then the full original operand is replicated to fill out the new value. Otherwise, only its first element is replicated. Was that the intent? If so, it seems like a difficult interface to use robustly.

When I extend vector of indices, I want to add existing values (replicated small vector or replicated first element - does not matter). The gather/scatter will be faster, as far as I know (I'll check it again). But now I think that may be this is too X86 specific?
May be I should fill indices with "undef" and then replace these "undefs" in target specific part?

> An enum with possible values of Zext, Splat or Undef would make the widening interface more clear
I thought about this, but I don't see too many enums in LLVM. All enums have global senсe, not per function.
> But as a further question, when would a larger vector length not be a multiple of a smaller one?
The original type may not be power of 2. It is type legalizer, it should be able to deal with any type.
< 3 x i64 > -> < 4 x i64 >

================
Comment at: ../lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp:2704-2714
@@ -2708,11 +2703,13 @@
 
-    unsigned NumConcat = WidenNumElts / MaskNumElts;
-    SmallVector<SDValue, 16> Ops(NumConcat);
-    SDValue ZeroVal = DAG.getConstant(0, dl, MaskVT);
-    Ops[0] = Mask;
-    for (unsigned i = 1; i != NumConcat; ++i)
-      Ops[i] = ZeroVal;
+  unsigned WidenNumElts = BoolVT.getVectorNumElements();
+  unsigned MaskNumElts = MaskVT.getVectorNumElements();
 
-    Mask = DAG.getNode(ISD::CONCAT_VECTORS, dl, BoolVT, Ops);
-  }
+  unsigned NumConcat = WidenNumElts / MaskNumElts;
+  SmallVector<SDValue, 16> Ops(NumConcat);
+  SDValue ZeroVal = DAG.getConstant(0, dl, MaskVT);
+  Ops[0] = Mask;
+  for (unsigned i = 1; i != NumConcat; ++i)
+    Ops[i] = ZeroVal;
+
+  Mask = DAG.getNode(ISD::CONCAT_VECTORS, dl, BoolVT, Ops);
 
----------------
mbodart wrote:
> Now that ModifyToType supports zero fill, why aren't we using it here (and in WidenVecOp_MSTORE)?
I'll simplify MSTORE in a separate patch.

================
Comment at: ../lib/Target/X86/X86ISelLowering.cpp:1595-1596
@@ -1594,2 +1594,4 @@
         setOperationAction(ISD::MSTORE,              VT, Legal);
+        setOperationAction(ISD::MGATHER,             VT, Legal);
+        setOperationAction(ISD::MSCATTER,            VT, Custom);
       }
----------------
mbodart wrote:
> Why would 512-bit gathers and scatters have different operation actions here?
The SCATTER is more problematic than GATHER. MSCATTER node returns only the Chain.
The VPSCATTER instruction in X86 zeroes mask operand and I should specify it as "return value".

================
Comment at: ../lib/Target/X86/X86ISelLowering.cpp:19510
@@ -19507,1 +19509,3 @@
 
+/// Modifies a vector input (widen or narrows) to a vector of NVT.  The
+/// input vector must have the same element type as NVT.
----------------
mbodart wrote:
> It's unfortunate that we have to duplicate much of the functionality of the generic type modifier in CodeGen's ModifyToType.
> 
> Is there a way to unify them?
> 
> If not, please add a source comment describing how this implementation differs from the one in CodeGen.
I simplified the X86 version. I call it ExtendToType. It works with legal types only and optimized for X86.

================
Comment at: ../test/CodeGen/X86/masked_gather_scatter.ll:2
@@ -2,1 +1,3 @@
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu  -mattr=+avx512f < %s | FileCheck %s --check-prefix=ALL --check-prefix=KNL
+; RUN: llc -mtriple=x86_64-unknown-linux-gnu  -mattr=+avx512vl -mattr=+avx512dq < %s | FileCheck %s --check-prefix=ALL --check-prefix=SKX
 
----------------
mbodart wrote:
> Have you tried your changes with gathers/scatters targetting 32-bit X86?
> 
> Poking around in the X86 test area, it seems all triples use x86_64 for gather/scatter tests.
I tried before, saw that it works. I added tests for 32-bit.


Repository:
  rL LLVM

http://reviews.llvm.org/D13633