[llvm] [NVPTX] Generalize and extend upsizing when lowering 8/16-bit-element vector loads/stores (PR #119622)

Thu Dec 12 10:10:19 PST 2024

================
@@ -1400,11 +1400,12 @@ bool NVPTXDAGToDAGISel::tryLoadVector(SDNode *N) {
 
   EVT EltVT = N->getValueType(0);
 
-  // v8x16 is a special case. PTX doesn't have ld.v8.16
-  // instruction. Instead, we split the vector into v2x16 chunks and
-  // load them with ld.v4.b32.
-  if (Isv2x16VT(EltVT)) {
-    assert(N->getOpcode() == NVPTXISD::LoadV4 && "Unexpected load opcode.");
+  // Vectors of 8-and-16-bit elements above a certain size are special cases.
+  // PTX doesn't have anything larger than ld.v4 for those element types.
+  // In Type Legalization, rather than splitting those vectors into multiple
+  // loads, we split the vector into v2x16/v4i8 chunks. Now, we lower to PTX as
+  // vector loads of b32.
----------------
dakersnar wrote:

```
  // Despite vectors like v8i8, v16i8, v8i16 being within the bit-limit for
  // total load/store size, PTX syntax only supports v2/v4. Thus, we can't use
  // vectorized loads/stores with the actual element type for i8/i16 as that
  // would require v8/v16 variants that do not exist.
  // In order to load/store such vectors efficiently, in Type Legalization,
  // we split the vector into word-sized chunks (v2x16/v4i8). Now, we lower to
  // PTX as vectors of b32.
```

This sound good?

https://github.com/llvm/llvm-project/pull/119622