[llvm] [NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (PR #139292)

Fri May 9 10:24:30 PDT 2025

================
@@ -1502,6 +1542,16 @@ bool NVPTXDAGToDAGISel::tryStoreVector(SDNode *N) {
     N2 = N->getOperand(5);
     ToTypeWidth = TotalWidth / 4;
     break;
+  case NVPTXISD::StoreV8:
+    if (!Subtarget->has256BitMaskedLoadStore())
+      return false;
+    VecType = NVPTX::PTXLdStInstCode::V8;
+    Ops.append({N->getOperand(1), N->getOperand(2), N->getOperand(3),
+                N->getOperand(4), N->getOperand(5), N->getOperand(6),
+                N->getOperand(7), N->getOperand(8)});
+    N2 = N->getOperand(9);
----------------
AlexMaclean wrote:

Can we pull some of this logic out of the switch here as well. I think it would be cleanest if the switch only produced the number of elements (perhaps as a static helper function or lambda). And then we just used that number to derive all these other variables. 

https://github.com/llvm/llvm-project/pull/139292