[llvm] [NVPTX] Generalize and extend upsizing when lowering 8/16-bit-element vector loads/stores (PR #119622)

Wed Dec 11 15:48:42 PST 2024

================
@@ -1400,11 +1400,12 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
 
   EVT EltVT = N->getValueType(0);
 
-  // v8x16 is a special case. PTX doesn't have ld.v8.16
-  // instruction. Instead, we split the vector into v2x16 chunks and
-  // load them with ld.v4.b32.
-  if (Isv2x16VT(EltVT)) {
-    assert(N->getOpcode() == NVPTXISD::LoadV4 && "Unexpected load opcode.");
+  // Vectors of 8-and-16-bit elements above a certain size are special cases.
+  // PTX doesn't have anything larger than ld.v4 for those element types.
+  // In Type Legalization, rather than splitting those vectors into multiple
+  // loads, we split the vector into v2x16/v4i8 chunks. Now, we lower to PTX as
+  // vector loads of b32.
----------------
Artem-B wrote:

This could use some editing. The key limitation is that PTX syntax only supports v2/v4. We can't use vectorized loads/stores with the actual element type for i8/i16 as that would need v8/v16 variants that do not exist. In order to load/store such vectors efficiently, we perform the loads/stores as vectors of i32.

https://github.com/llvm/llvm-project/pull/119622