[llvm] [NVPTX] Add TMA bulk tensor reduction intrinsics (PR #116854)

Tue Nov 26 05:09:11 PST 2024

================
@@ -4177,31 +4177,40 @@ bool NVPTXScopes::empty() const { return Scopes.size() == 0; }
                : NVPTX::CP_ASYNC_BULK_TENSOR_PREFETCH_##dim##_##mode)
 
 static unsigned GetCpAsyncBulkTensorS2GOpcode(size_t Dim, bool IsShared32,
-                                              bool IsCacheHint, bool IsIm2Col) {
+                                              bool IsCacheHint, bool IsIm2Col,
+                                              bool IsReduce = false) {
   if (IsIm2Col) {
     switch (Dim) {
     case 3:
-      return GET_CP_ASYNC_BULK_TENSOR_OPCODE_S2G(3D, IM2COL);
+      return IsReduce ? GET_CP_ASYNC_BULK_TENSOR_OPCODE_CH(RED, 3D, IM2COL)
+                      : GET_CP_ASYNC_BULK_TENSOR_OPCODE_CH(S2G, 3D, IM2COL);
----------------
durga4github wrote:

Handled in the latest revision,
Resolving this,

https://github.com/llvm/llvm-project/pull/116854