[llvm] [NVPTX] Add TMA bulk tensor reduction intrinsics (PR #116854)
Artem Belevich via llvm-commits
llvm-commits at lists.llvm.org
Fri Nov 22 11:21:49 PST 2024
================
@@ -4177,31 +4177,40 @@ bool NVPTXScopes::empty() const { return Scopes.size() == 0; }
: NVPTX::CP_ASYNC_BULK_TENSOR_PREFETCH_##dim##_##mode)
static unsigned GetCpAsyncBulkTensorS2GOpcode(size_t Dim, bool IsShared32,
- bool IsCacheHint, bool IsIm2Col) {
+ bool IsCacheHint, bool IsIm2Col,
+ bool IsReduce = false) {
if (IsIm2Col) {
switch (Dim) {
case 3:
- return GET_CP_ASYNC_BULK_TENSOR_OPCODE_S2G(3D, IM2COL);
+ return IsReduce ? GET_CP_ASYNC_BULK_TENSOR_OPCODE_CH(RED, 3D, IM2COL)
+ : GET_CP_ASYNC_BULK_TENSOR_OPCODE_CH(S2G, 3D, IM2COL);
----------------
Artem-B wrote:
This looks like another case for applying another macro level, selecting the op based on `IsReduce`.
https://github.com/llvm/llvm-project/pull/116854
More information about the llvm-commits
mailing list