[llvm] [NVPTX] Add cta_group support to TMA G2S intrinsics (PR #143178)
Durgadoss R via llvm-commits
llvm-commits at lists.llvm.org
Mon Jun 9 07:36:13 PDT 2025
================
@@ -1034,18 +1034,22 @@ source tensor is preserved at the destination. The dimension of the
tensor data ranges from 1d to 5d with the coordinates specified
by the ``i32 %d0 ... i32 %d4`` arguments.
-* The last two arguments to these intrinsics are boolean flags
- indicating support for cache_hint and/or multicast modifiers.
- These flag arguments must be compile-time constants. The backend
- looks through these flags and lowers the intrinsics appropriately.
+* The last three arguments to these intrinsics are boolean flags
+ indicating support for multicast, cache_hint and cta_group::2
+ modifiers. These flag arguments must be compile-time constants.
+ The backend looks through these flags and lowers the intrinsics
+ appropriately.
----------------
durga4github wrote:
I had used a boolean since cta_group::1 is the default (i.e. the same as no cta_group explicitly mentioned).
I have made it explicit in the latest revision by having the `cta_group` as an i32 parameter tied to the range [0, 3).
`0` is the default value without any modifier. Values of `1` and `2` generate the corresponding variants.
Anything outside the range raises an error from the Verifier (through the Range attribute in the td file).
With this, the intrinsic declaration in the `.ll` file explicitly mentions the `range` like below:
```
declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.tile.1d(ptr addrspace(7) writeonly, ptr addrspace(3), ptr readonly, i32, i16, i64, i1 immarg, i1 immarg, i32 immarg range(i32 0, 3))
```
Please let me know if the latest revision looks better.
https://github.com/llvm/llvm-project/pull/143178
More information about the llvm-commits
mailing list