[llvm] [NVPTX] Vectorize and lower 256-bit global loads/stores for sm_100+/ptx88+ (PR #139292)

Fri May 9 10:35:42 PDT 2025

================
@@ -1328,14 +1349,33 @@ bool NVPTXDAGToDAGISel::tryLDGLDU(SDNode *N) {
     Opcode = pickOpcodeForVT(
         EltVT.getSimpleVT().SimpleTy, NVPTX::INT_PTX_LDG_G_v4i8_ELE,
         NVPTX::INT_PTX_LDG_G_v4i16_ELE, NVPTX::INT_PTX_LDG_G_v4i32_ELE,
-        std::nullopt, NVPTX::INT_PTX_LDG_G_v4f32_ELE, std::nullopt);
+        NVPTX::INT_PTX_LDG_G_v4i64_ELE, NVPTX::INT_PTX_LDG_G_v4f32_ELE,
+        NVPTX::INT_PTX_LDG_G_v4f64_ELE);
     break;
   case NVPTXISD::LDUV4:
     Opcode = pickOpcodeForVT(
         EltVT.getSimpleVT().SimpleTy, NVPTX::INT_PTX_LDU_G_v4i8_ELE,
         NVPTX::INT_PTX_LDU_G_v4i16_ELE, NVPTX::INT_PTX_LDU_G_v4i32_ELE,
         std::nullopt, NVPTX::INT_PTX_LDU_G_v4f32_ELE, std::nullopt);
     break;
+  case NVPTXISD::LoadV8:
+    switch (EltVT.getSimpleVT().SimpleTy) {
+    case MVT::i32:
+      Opcode = NVPTX::INT_PTX_LDG_G_v8i32_ELE;
+      break;
----------------
dakersnar wrote:

At the time it was because I didn't want to change the helper to take in optionals for all the arguments, as currently you can only pass nullopt for i64 and f64. But I think that seems like a reasonable change.

https://github.com/llvm/llvm-project/pull/139292