[Mlir-commits] [mlir] [MLIR][AMDGPU] Add amdgpu.global_transpose_load op for RDNA4 global memory transpose loads (PR #195287)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Fri May 1 09:32:11 PDT 2026
github-actions[bot] wrote:
<!--LLVM CODE FORMAT COMMENT: {clang-format}-->
:warning: C/C++ code formatter, clang-format found issues in your code. :warning:
<details>
<summary>
You can test this locally with the following command:
</summary>
``````````bash
git-clang-format --diff origin/main HEAD --extensions cpp -- mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp --diff_from_common_commit
``````````
:warning:
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing `origin/main` to the base branch/commit you want to compare against.
:warning:
</details>
<details>
<summary>
View the diff from clang-format here.
</summary>
``````````diff
diff --git a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
index 5844a845b..ef7dfa54c 100644
--- a/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
+++ b/mlir/lib/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.cpp
@@ -4468,8 +4468,8 @@ void mlir::populateAMDGPUToROCDLConversionPatterns(LLVMTypeConverter &converter,
PackedScaledTruncOpLowering, PackedTrunc2xFp8OpLowering,
PackedStochRoundFp8OpLowering, GatherToLDSOpLowering,
GlobalLoadAsyncToLDSOpLowering, TransposeLoadOpLowering,
- GlobalTransposeLoadOpLowering,
- AMDGPUPermlaneLowering, AMDGPUMakeDmaBaseLowering<MakeDmaBaseOp>,
+ GlobalTransposeLoadOpLowering, AMDGPUPermlaneLowering,
+ AMDGPUMakeDmaBaseLowering<MakeDmaBaseOp>,
AMDGPUMakeDmaBaseLowering<MakeGatherDmaBaseOp>,
AMDGPULowerDescriptor<MakeDmaDescriptorOp>,
AMDGPULowerDescriptor<MakeGatherDmaDescriptorOp>,
diff --git a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
index 7d9bccd89..d50448a6d 100644
--- a/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
+++ b/mlir/lib/Dialect/AMDGPU/IR/AMDGPUOps.cpp
@@ -1091,13 +1091,12 @@ LogicalResult GlobalTransposeLoadOp::verify() {
auto resultType = cast<VectorType>(getType());
size_t numElements = resultType.getNumElements();
- size_t elementTypeSize =
- resultType.getElementType().getIntOrFloatBitWidth();
+ size_t elementTypeSize = resultType.getElementType().getIntOrFloatBitWidth();
// ElementSize -> NumElements (matches ISA-documented global_load_tr variants)
const llvm::SmallDenseMap<size_t, size_t> kValidLoadSizeMap = {
- {8, 8}, // global_load_tr_b64
- {16, 8}, // global_load_tr_b128
+ {8, 8}, // global_load_tr_b64
+ {16, 8}, // global_load_tr_b128
};
auto validNumElems = kValidLoadSizeMap.find(elementTypeSize);
``````````
</details>
https://github.com/llvm/llvm-project/pull/195287
More information about the Mlir-commits
mailing list