[PATCH] D30011: [NVPTX] Unify vectorization of load/stores of aggregate arguments and return values.

Fri Feb 17 10:38:22 PST 2017

tra added inline comments.

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:208
+  // Element is too large to vectorize.
+  if (EltSize >= AccessSize)
+    return 1;
----------------
jlebar wrote:
> Actually I'm not sure why we bail here if the sizes are equal.  Don't we want to vectorize {i32, i32}, even if AccessSize is 4 bytes?
If EltSize == AccessSize, then we can only do 1-element load/store and there's nothing to vectorize.
{i32,i32} needs AccessSize of at least 8 so we can fold it into ld/st.v2 .

================
Comment at: lib/Target/NVPTX/NVPTXISelLowering.cpp:2400

-        StoreVal = OutVals[i];
-        if (NeedExtend)
-          StoreVal = DAG.getNode(ISD::ZERO_EXTEND, dl, ExtendedVT, StoreVal);
-        Ops.push_back(StoreVal);
-
-        if (i + 1 < NumElts) {
-          StoreVal = OutVals[i + 1];
-          if (NeedExtend)
-            StoreVal = DAG.getNode(ISD::ZERO_EXTEND, dl, ExtendedVT, StoreVal);
-        } else {
-          StoreVal = DAG.getUNDEF(ExtendedVT);
-        }
-        Ops.push_back(StoreVal);
-
-        if (VecSize == 4) {
-          Opc = NVPTXISD::StoreRetvalV4;
-          if (i + 2 < NumElts) {
-            StoreVal = OutVals[i + 2];
-            if (NeedExtend)
-              StoreVal =
-                  DAG.getNode(ISD::ZERO_EXTEND, dl, ExtendedVT, StoreVal);
-          } else {
-            StoreVal = DAG.getUNDEF(ExtendedVT);
-          }
-          Ops.push_back(StoreVal);
-
-          if (i + 3 < NumElts) {
-            StoreVal = OutVals[i + 3];
-            if (NeedExtend)
-              StoreVal =
-                  DAG.getNode(ISD::ZERO_EXTEND, dl, ExtendedVT, StoreVal);
-          } else {
-            StoreVal = DAG.getUNDEF(ExtendedVT);
-          }
-          Ops.push_back(StoreVal);
-        }
+  SmallVector<SDValue, 6> LdStOps;
+  for (unsigned i = 0, e = Outs.size(); i != e; ++i) {
----------------
jlebar wrote:
> This looks oddly familiar from the parameter handling above.  We really can't factor it out without making this into more of a hairball?  If not, same comments apply down here.
Alas, I've found no good way to generalize this further. We have 4 cases that we need to vectorize:
* Load of function args
* Storing return value.
* Storing function args before a call
* Loading call result.

Loads are currently rather different, so there's not much to factor out between them.
Stores are indeed similar, but create different nodes that take different set of operands. Combining them would probably make things more confusing that it already is.

https://reviews.llvm.org/D30011