[llvm] [NVPTX] Add cta_group support to TMA G2S intrinsics (PR #143178)

Mon Jun 9 07:36:13 PDT 2025

================
@@ -1034,18 +1034,22 @@ source tensor is preserved at the destination. The dimension of the
 tensor data ranges from 1d to 5d with the coordinates specified
 by the ``i32 %d0 ... i32 %d4`` arguments.
 
-* The last two arguments to these intrinsics are boolean flags
-  indicating support for cache_hint and/or multicast modifiers.
-  These flag arguments must be compile-time constants. The backend
-  looks through these flags and lowers the intrinsics appropriately.
+* The last three arguments to these intrinsics are boolean flags
+  indicating support for multicast, cache_hint and cta_group::2
+  modifiers. These flag arguments must be compile-time constants.
+  The backend looks through these flags and lowers the intrinsics
+  appropriately.
----------------
durga4github wrote:

I had used a boolean since cta_group::1 is the default (i.e. the same as no cta_group explicitly mentioned).

I have made it explicit in the latest revision by having the `cta_group` as an i32 parameter tied to the range [0, 3).
`0` is the default value without any modifier. Values of `1` and `2` generate the corresponding variants.

Anything outside the range raises an error from the Verifier (through the Range attribute in the td file).
With this, the intrinsic declaration in the `.ll` file explicitly mentions the `range` like below:
```
declare void @llvm.nvvm.cp.async.bulk.tensor.g2s.tile.1d(ptr addrspace(7) writeonly, ptr addrspace(3), ptr readonly, i32, i16, i64, i1 immarg, i1 immarg, i32 immarg range(i32 0, 3))
```

Please let me know if the latest revision looks better.


https://github.com/llvm/llvm-project/pull/143178