[Mlir-commits] [mlir] [mlir][amdgpu] Lower amdgpu.make_dma_base (PR #169817)
Krzysztof Drewniak
llvmlistbot at llvm.org
Tue Dec 2 08:29:04 PST 2025
================
@@ -1251,35 +1250,39 @@ def AMDGPU_MakeDmaBaseOp :
For example:
```mlir
- %base = amdgpu.make_dma_base %src[%idx0], %dst[%idx1] : memref<8xi32>, memref<8xi32, #gpu.address_space<workgroup>> -> !amdgpu.tdm_base<i32>
+ %base = amdgpu.make_dma_base %lds[%idx0, %idx1], %global[%idx2, %idx3] : memref<64x64xi32, #gpu.address_space<workgroup>>, memref<64x64xi32> -> !amdgpu.tdm_base<i32>
%descriptor = amdgpu.make_dma_descriptor %base globalSize [2, 2] globalStride [2, 1] sharedSize [2, 2] : !amdgpu.tdm_base<i32> -> !amdgpu.tdm_descriptor
amdgpu.tensor_load_to_lds %descriptor : !amdgpu.tdm_descriptor
```
to
```mlir
- // pseudocode
- %base_0 = llvm.mlir.undef : !llvm.struct<(ptr, ptr)>
- %base_1 = llvm.insertvalue %global_addr, %base_0[0] : !llvm.struct<(ptr, ptr)>
- %base_2 = llvm.insertvalue %lds_addr, %base_1[1] : !llvm.struct(ptr, ptr)>
- // type(%base_2) = !llvm.struct<(ptr, ptr) roughly corresponds to amdgpu.tdm_base<i32>
-
- // The base will be used when contructing dgroup0
- // when lowering amdgpu.make_dma_descriptor
- %dgroup0_0 = llvm.mlir.undef : !llvm.struct<(....)>
- %dgroup0_1 = llvm.insertvalue %base2, %dgroup0_0 : ....
-
- // When lowering amdgpu.tensor_load_to_lds
- rocdl.tensor.load.to.lds %dgroup0, %dgroup1, %dgroup2, %dgroup3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
+ // pseudo-code
+ %global_base = llvm.extractvalue %global_memref[1]
+ %global_address = llvm.get_element_ptr ...
+
+ %lds_base = llvm.extractvalue %lds_memref[1]
+ %lds_address = llvm.get_element_ptr ...
+
+ // Definition of %base
+ %undef = llvm.mlir.undef : vector<4xi32>
+ %v0 = llvm.insertelement %15, %undef[0] : vector<4xi32>
+ %v1 = llvm.insertelement %lds_address, %v0[1] : vector<4xi32>
+ %v2 = llvm.insertelement %global_address_low, %v1[2] : vector<4xi32>
+ %base = llvm.insertelement %global_address_high, %v2[3] : vector<4xi32>
+
+ rocdl.tensor.load.to.lds %base, %dgroup1, %dgroup2, %dgroup3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
```
These tensor DMA operations were introduced in gfx1250.
}];
let assemblyFormat = [{
- $src `[` $src_indices `]` `,` $dst `[` $dst_indices `]` attr-dict `:` type($src) `,` type($dst) `->` type(results)
+ $lds `[` $lds_indices `]` `,` $global `[` $global_indices `]` attr-dict `:` type($lds) `,` type($global) `->` type(results)
----------------
krzysz00 wrote:
I'd swap `$global` and `$lds` here too
(This is mainly because the documentation I've seen usually talks about this thing from a global => lds perspective so it'd be good to stay consistent)
https://github.com/llvm/llvm-project/pull/169817
More information about the Mlir-commits
mailing list