[llvm] [NVPTX] Add TMA bulk tensor reduction intrinsics (PR #116854)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Fri Nov 22 11:21:49 PST 2024


================
@@ -4177,31 +4177,40 @@ bool NVPTXScopes::empty() const { return Scopes.size() == 0; }
                : NVPTX::CP_ASYNC_BULK_TENSOR_PREFETCH_##dim##_##mode)
 
 static unsigned GetCpAsyncBulkTensorS2GOpcode(size_t Dim, bool IsShared32,
-                                              bool IsCacheHint, bool IsIm2Col) {
+                                              bool IsCacheHint, bool IsIm2Col,
+                                              bool IsReduce = false) {
   if (IsIm2Col) {
     switch (Dim) {
     case 3:
-      return GET_CP_ASYNC_BULK_TENSOR_OPCODE_S2G(3D, IM2COL);
+      return IsReduce ? GET_CP_ASYNC_BULK_TENSOR_OPCODE_CH(RED, 3D, IM2COL)
+                      : GET_CP_ASYNC_BULK_TENSOR_OPCODE_CH(S2G, 3D, IM2COL);
----------------
Artem-B wrote:

This looks like another case for applying another macro level, selecting the op based on `IsReduce`.

https://github.com/llvm/llvm-project/pull/116854


More information about the llvm-commits mailing list