[llvm] [NVTPX] Copy kernel arguments as byte array (PR #110356)

Wed Oct 2 14:30:47 PDT 2024

================
@@ -623,13 +623,33 @@ void NVPTXLowerArgs::handleByValParam(const NVPTXTargetMachine &TM,
     Value *ArgInParam = new AddrSpaceCastInst(
         Arg, PointerType::get(Arg->getContext(), ADDRESS_SPACE_PARAM),
         Arg->getName(), FirstInst);
+    // Create an opaque type of same size as StructType but without padding
+    // holes as this could have been a union.
+    const auto StructBytes = *AllocA->getAllocationSize(DL);
+    SmallVector<Type *, 5> ChunkTypes;
+    if (StructBytes >= 16) {
+        Type *IntType = Type::getInt64Ty(Func->getContext());
+        Type *ChunkType = VectorType::get(IntType, 2, false);
+        Type *OpaqueType = StructBytes < 32 ? ChunkType :
+                           ArrayType::get(ChunkType, StructBytes / 16);
+        ChunkTypes.push_back(OpaqueType);
+    }
+    for (const auto ChunkBytes: {8, 4, 2, 1}) {
+      if (StructBytes & ChunkBytes) {
+          Type *ChunkType = Type::getIntNTy(Func->getContext(), 8 * ChunkBytes);
+          ChunkTypes.push_back(ChunkType);
+      }
+    }
+    Type * OpaqueType = ChunkTypes.size() == 1 ? ChunkTypes[0] :
+                        StructType::create(ChunkTypes);
----------------
Artem-B wrote:

Do you still have the IR that didn't get vectorized? If it should be vectorizable, that's the problem that we should fix, rather than complicate things here.

I also see regression in load/store vectorization in some other code I'm looking at, so I suspect we may have a problem lurking there.

https://github.com/llvm/llvm-project/pull/110356