[Mlir-commits] [mlir] [AMDGPU] Adding AMDGPU dialect wrapper for ROCDL transpose loads. (PR #145395)
Alan Li
llvmlistbot at llvm.org
Mon Jun 23 19:02:18 PDT 2025
================
@@ -531,13 +531,30 @@ LogicalResult TransposeLoadOp::verify() {
return emitOpError("source memory address space must be Workgroup");
// TODO: support 6-bit element type vectors.
- auto transferType = dyn_cast<VectorType>(getDst().getType());
+ auto transferType = dyn_cast<VectorType>(getType());
if (!transferType)
return emitOpError("destination type must be a vector type");
size_t transferSize =
transferType.getNumElements() * transferType.getElementTypeBitWidth();
- if (transferSize != 64)
- return emitOpError("Transferring type size must be 64 bits");
+ size_t elementTypeSize = srcType.getElementType().getIntOrFloatBitWidth();
+
+ // ElementSize -> LoadSize
+ const std::map<size_t, size_t> KValidLoadSizeMap = {
+ {4, 64},
+ {32, 96}, // 6-bit element loads use casted vector<3xi32>
+ {8, 64},
+ {16, 64},
+ };
----------------
lialan wrote:
Awesome suggestion. Since this is a map instead of set, I use SmallDenseMap instead, which also avoids heap allocation!
https://github.com/llvm/llvm-project/pull/145395
More information about the Mlir-commits
mailing list