[Mlir-commits] [clang] [llvm] [mlir] [openmp] [OpenMP][offload] Cross-team reductions with variable number of teams (PR #195102)
llvmlistbot at llvm.org
llvmlistbot at llvm.org
Thu Apr 30 08:03:09 PDT 2026
llvmorg-github-actions[bot] wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-mlir-openmp
@llvm/pr-subscribers-mlir
Author: Robert Imschweiler (ro-i)
<details>
<summary>Changes</summary>
This is the first patch in an upcoming series of patches that rework OpenMP cross-team reductions.
This patch tries to be as minimal as possible and includes the following changes:
1) Don't work through larger number of teams in chunks. Allocate a suitable-sized global buffer for the team values and launch them all at once. The last team that finishes uses a strided loop to reduce the team values from the global buffer.
2) Inline the new functions to reduce register usage, get rid of spills, and get rid of long switch-tables that codegen produced for the indirect callbacks that are passed to the parallel/xteam reduction.*
The performance benefits in comparison to the previous state are often up to 5x-10x. I did not observe any performance regressions. Can be reproduced using my benchmark suite https://github.com/ro-i/xteam-test (6854b7abc8848702b5a2d9ce2ea02849b5dc590b). Set compiler paths in `local.mk` and run something like `make trunk_dev trunk && ./run_bench.sh -rRq -a -n3 trunk_208 trunk_dev_208 trunk_10400 trunk_dev_10400` to compare performance for reductions and reduction simulations (that try to avoid the reduction-specific codegen), using either 208 teams or 10400 teams à 512 threads/team.
*For a dot reduction using `double` type, for example, we previously had something like
`LDS Usage: 540B #SGPRs/VGPRs: 106/45 #SGPR/VGPR Spills: 34/0 Tripcount: 177777777`, which now became
`LDS Usage: 280B #SGPRs/VGPRs: 48/30 #SGPR/VGPR Spills: 0/0 Tripcount: 177777777`.
This patch uses ideas from Johannes Doerfert, ideas from the AOMP cross-team reduction implementation, and was assisted by Claude.
---
Patch is 1.37 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/195102.diff
166 Files Affected:
- (modified) clang/include/clang/Basic/LangOptions.def (+1-1)
- (modified) clang/include/clang/Options/Options.td (+5-1)
- (modified) clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp (+9-5)
- (modified) clang/lib/Driver/ToolChains/Clang.cpp (+12)
- (modified) clang/lib/Frontend/CompilerInvocation.cpp (+1-1)
- (modified) clang/test/Driver/openmp-offload-gpu.c (+1)
- (modified) clang/test/OpenMP/bug60602.cpp (+2-2)
- (modified) clang/test/OpenMP/distribute_codegen.cpp (+10-10)
- (modified) clang/test/OpenMP/distribute_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_parallel_for_codegen.cpp (+28-28)
- (modified) clang/test/OpenMP/distribute_parallel_for_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_parallel_for_if_codegen.cpp (+8-8)
- (modified) clang/test/OpenMP/distribute_parallel_for_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_parallel_for_num_threads_codegen.cpp (+24-24)
- (modified) clang/test/OpenMP/distribute_parallel_for_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_parallel_for_proc_bind_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/distribute_parallel_for_simd_codegen.cpp (+28-28)
- (modified) clang/test/OpenMP/distribute_parallel_for_simd_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_parallel_for_simd_if_codegen.cpp (+32-32)
- (modified) clang/test/OpenMP/distribute_parallel_for_simd_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_parallel_for_simd_num_threads_codegen.cpp (+24-24)
- (modified) clang/test/OpenMP/distribute_parallel_for_simd_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_parallel_for_simd_proc_bind_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/distribute_private_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/distribute_simd_codegen.cpp (+20-20)
- (modified) clang/test/OpenMP/distribute_simd_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_simd_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/distribute_simd_private_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/distribute_simd_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/map_struct_ordering.cpp (+1-1)
- (modified) clang/test/OpenMP/reduction_implicit_map.cpp (+6-6)
- (modified) clang/test/OpenMP/target_codegen_global_capture.cpp (+6-6)
- (modified) clang/test/OpenMP/target_default_codegen.cpp (+16-16)
- (modified) clang/test/OpenMP/target_defaultmap_codegen_03.cpp (+8-8)
- (modified) clang/test/OpenMP/target_dyn_groupprivate_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/target_firstprivate_codegen.cpp (+24-24)
- (modified) clang/test/OpenMP/target_has_device_addr_codegen.cpp (+15-15)
- (modified) clang/test/OpenMP/target_has_device_addr_codegen_01.cpp (+2-2)
- (modified) clang/test/OpenMP/target_is_device_ptr_codegen.cpp (+44-44)
- (modified) clang/test/OpenMP/target_map_array_of_structs_with_nested_mapper_codegen.cpp (+1-1)
- (modified) clang/test/OpenMP/target_map_array_section_no_length_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_map_array_section_of_structs_with_nested_mapper_codegen.cpp (+1-1)
- (modified) clang/test/OpenMP/target_map_codegen_03.cpp (+2-2)
- (modified) clang/test/OpenMP/target_map_codegen_hold.cpp (+12-12)
- (modified) clang/test/OpenMP/target_map_deref_array_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/target_map_member_expr_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/target_offload_mandatory_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/target_ompx_dyn_cgroup_mem_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/target_parallel_codegen.cpp (+14-14)
- (modified) clang/test/OpenMP/target_parallel_for_codegen.cpp (+28-28)
- (modified) clang/test/OpenMP/target_parallel_for_simd_codegen.cpp (+28-28)
- (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen-1.cpp (+12-12)
- (modified) clang/test/OpenMP/target_parallel_generic_loop_codegen-2.cpp (+2-2)
- (modified) clang/test/OpenMP/target_parallel_generic_loop_uses_allocators_codegen.cpp (+1-1)
- (modified) clang/test/OpenMP/target_parallel_if_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/target_parallel_num_threads_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/target_parallel_num_threads_strict_codegen.cpp (+8-8)
- (modified) clang/test/OpenMP/target_task_affinity_codegen.cpp (+2-2)
- (modified) clang/test/OpenMP/target_teams_codegen.cpp (+26-26)
- (modified) clang/test/OpenMP/target_teams_distribute_codegen.cpp (+14-14)
- (modified) clang/test/OpenMP/target_teams_distribute_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/target_teams_distribute_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/target_teams_distribute_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_if_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_proc_bind_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_schedule_codegen.cpp (+60-60)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_if_codegen.cpp (+24-24)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_proc_bind_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_parallel_for_simd_schedule_codegen.cpp (+60-60)
- (modified) clang/test/OpenMP/target_teams_distribute_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_reduction_codegen.cpp (+40-40)
- (modified) clang/test/OpenMP/target_teams_distribute_simd_codegen.cpp (+28-28)
- (modified) clang/test/OpenMP/target_teams_distribute_simd_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/target_teams_distribute_simd_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/target_teams_distribute_simd_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_simd_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_simd_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_distribute_simd_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_generic_loop_codegen-1.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_generic_loop_codegen.cpp (+1-27)
- (modified) clang/test/OpenMP/target_teams_generic_loop_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/target_teams_generic_loop_if_codegen.cpp (+5-5)
- (modified) clang/test/OpenMP/target_teams_generic_loop_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_generic_loop_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/target_teams_generic_loop_uses_allocators_codegen.cpp (+1-1)
- (modified) clang/test/OpenMP/target_teams_map_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/target_teams_num_teams_codegen.cpp (+12-12)
- (renamed) clang/test/OpenMP/target_teams_reduction_codegen.cpp (+27-1372)
- (modified) clang/test/OpenMP/target_teams_thread_limit_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_codegen.cpp (+20-20)
- (modified) clang/test/OpenMP/teams_distribute_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_distribute_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/teams_distribute_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/teams_distribute_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_copyin_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_if_codegen.cpp (+8-8)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_num_threads_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_proc_bind_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_schedule_codegen.cpp (+60-60)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_if_codegen.cpp (+32-32)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_num_threads_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_proc_bind_codegen.cpp (+3-3)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_parallel_for_simd_schedule_codegen.cpp (+60-60)
- (modified) clang/test/OpenMP/teams_distribute_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_simd_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/teams_distribute_simd_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/teams_distribute_simd_dist_schedule_codegen.cpp (+18-18)
- (modified) clang/test/OpenMP/teams_distribute_simd_firstprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_simd_lastprivate_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_simd_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_distribute_simd_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_firstprivate_codegen.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_generic_loop_codegen-1.cpp (+12-12)
- (modified) clang/test/OpenMP/teams_generic_loop_collapse_codegen.cpp (+6-6)
- (modified) clang/test/OpenMP/teams_generic_loop_private_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_generic_loop_reduction_codegen.cpp (+4-4)
- (modified) clang/test/OpenMP/teams_private_codegen.cpp (+10-10)
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPConstants.h (+1-1)
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPIRBuilder.h (+10-1)
- (modified) llvm/include/llvm/Frontend/OpenMP/OMPKinds.def (+5-5)
- (modified) llvm/lib/Frontend/OpenMP/OMPIRBuilder.cpp (+7-23)
- (modified) llvm/lib/Transforms/IPO/OpenMPOpt.cpp (+1-1)
- (modified) llvm/test/Transforms/OpenMP/add_attributes.ll (+4-4)
- (modified) mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp (+6-4)
- (modified) mlir/test/Target/LLVMIR/allocatable_gpu_reduction_teams.mlir (+4-12)
- (modified) mlir/test/Target/LLVMIR/omptarget-multi-reduction.mlir (+2-2)
- (modified) mlir/test/Target/LLVMIR/omptarget-teams-distribute-reduction-array-descriptor.mlir (+6-8)
- (modified) mlir/test/Target/LLVMIR/omptarget-teams-distribute-reduction.mlir (+1-1)
- (modified) mlir/test/Target/LLVMIR/omptarget-teams-reduction.mlir (+1-1)
- (modified) offload/include/Shared/Environment.h (+5-2)
- (modified) offload/plugins-nextgen/common/include/PluginInterface.h (+8-5)
- (modified) offload/plugins-nextgen/common/src/PluginInterface.cpp (+32-10)
- (modified) openmp/device/include/Interface.h (-6)
- (modified) openmp/device/src/Reduction.cpp (+142-156)
``````````diff
diff --git a/clang/include/clang/Basic/LangOptions.def b/clang/include/clang/Basic/LangOptions.def
index 596bce9e897f7..77ea76775ae2f 100644
--- a/clang/include/clang/Basic/LangOptions.def
+++ b/clang/include/clang/Basic/LangOptions.def
@@ -231,7 +231,7 @@ LANGOPT(OpenMPCUDAMode , 1, 0, NotCompatible, "Generate code for OpenMP pragm
LANGOPT(OpenMPIRBuilder , 1, 0, NotCompatible, "Use the experimental OpenMP-IR-Builder codegen path.")
LANGOPT(OpenMPCUDANumSMs , 32, 0, NotCompatible, "Number of SMs for CUDA devices.")
LANGOPT(OpenMPCUDABlocksPerSM , 32, 0, NotCompatible, "Number of blocks per SM for CUDA devices.")
-LANGOPT(OpenMPCUDAReductionBufNum , 32, 1024, NotCompatible, "Number of the reduction records in the intermediate reduction buffer used for the teams reductions.")
+LANGOPT(OpenMPCUDAReductionBufNum , 32, 0, NotCompatible, "Deprecated and ignored: the teams reduction buffer is sized at kernel launch to match the actual number of teams. Retained for backwards compatibility with -fopenmp-cuda-teams-reduction-recs-num=.")
LANGOPT(OpenMPTargetDebug , 32, 0, NotCompatible, "Enable debugging in the OpenMP offloading device RTL")
LANGOPT(OpenMPOptimisticCollapse , 1, 0, NotCompatible, "Use at most 32 bits to represent the collapsed loop nest counter.")
LANGOPT(OpenMPThreadSubscription , 1, 0, NotCompatible, "Assume work-shared loops do not have more iterations than participating threads.")
diff --git a/clang/include/clang/Options/Options.td b/clang/include/clang/Options/Options.td
index c16c41ad4057d..7183bff79c1da 100644
--- a/clang/include/clang/Options/Options.td
+++ b/clang/include/clang/Options/Options.td
@@ -4091,7 +4091,11 @@ def fopenmp_cuda_number_of_sm_EQ : Joined<["-"], "fopenmp-cuda-number-of-sm=">,
def fopenmp_cuda_blocks_per_sm_EQ : Joined<["-"], "fopenmp-cuda-blocks-per-sm=">, Group<f_Group>,
Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
def fopenmp_cuda_teams_reduction_recs_num_EQ : Joined<["-"], "fopenmp-cuda-teams-reduction-recs-num=">, Group<f_Group>,
- Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>;
+ Flags<[NoArgumentUnused, HelpHidden]>, Visibility<[ClangOption, CC1Option]>,
+ HelpText<"Deprecated and ignored. The teams reduction buffer is sized "
+ "automatically at kernel launch to match the actual number of "
+ "teams; this flag is accepted for backwards compatibility only "
+ "and emits a deprecation warning when used.">;
//===----------------------------------------------------------------------===//
// Shared cc1 + fc1 OpenMP Target Options
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index 943c2ac9f8491..b18522b4cc491 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -788,8 +788,12 @@ void CGOpenMPRuntimeGPU::emitKernelDeinit(CodeGenFunction &CGF,
? 0
: DL.getTypeAllocSize(LLVMReductionsBufferTy).getFixedValue();
CGBuilderTy &Bld = CGF.Builder;
+ // The teams-reduction buffer is sized at kernel launch by the offload
+ // plugin to match the actual number of teams, so we always pass 0 as the
+ // buffer length (signal for dynamic sizing) regardless of any value
+ // supplied via the deprecated -fopenmp-cuda-teams-reduction-recs-num flag.
OMPBuilder.createTargetDeinit(Bld, ReductionDataSize,
- C.getLangOpts().OpenMPCUDAReductionBufNum);
+ /*TeamsReductionBufferLength=*/0);
TeamsReductions.clear();
}
@@ -1698,8 +1702,6 @@ void CGOpenMPRuntimeGPU::emitReduction(
bool ParallelReduction = isOpenMPParallelDirective(Options.ReductionKind);
bool TeamsReduction = isOpenMPTeamsDirective(Options.ReductionKind);
- ASTContext &C = CGM.getContext();
-
if (Options.SimpleReduction) {
assert(!TeamsReduction && !ParallelReduction &&
"Invalid reduction selection in emitReduction.");
@@ -1790,12 +1792,14 @@ void CGOpenMPRuntimeGPU::emitReduction(
Idx++;
}
+ // ReductionBufNum is unused by the current teams-reduction runtime; the
+ // buffer length is resolved at kernel launch by the offload plugin. Ignore
+ // the deprecated -fopenmp-cuda-teams-reduction-recs-num value here.
llvm::OpenMPIRBuilder::InsertPointTy AfterIP =
cantFail(OMPBuilder.createReductionsGPU(
OmpLoc, AllocaIP, CodeGenIP, ReductionInfos, /*IsByRef=*/{}, false,
TeamsReduction, llvm::OpenMPIRBuilder::ReductionGenCBKind::Clang,
- CGF.getTarget().getGridValue(),
- C.getLangOpts().OpenMPCUDAReductionBufNum, RTLoc));
+ CGF.getTarget().getGridValue(), /*ReductionBufNum=*/0, RTLoc));
CGF.Builder.restoreIP(AfterIP);
}
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index cfa3031431498..756b4e2862038 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -6872,6 +6872,18 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
CmdArgs.push_back("-fno-openmp-extensions");
Args.AddAllArgs(CmdArgs, options::OPT_fopenmp_cuda_number_of_sm_EQ);
Args.AddAllArgs(CmdArgs, options::OPT_fopenmp_cuda_blocks_per_sm_EQ);
+ // '-fopenmp-cuda-teams-reduction-recs-num=' is deprecated and has no
+ // effect: the teams reduction buffer is sized at kernel launch by the
+ // offload plugin to match the actual number of teams. Honoring a
+ // smaller user-supplied value would silently truncate the buffer for
+ // larger launches. The flag is still parsed (and forwarded to cc1)
+ // for backwards compatibility but is ignored by codegen.
+ if (Arg *A = Args.getLastArg(
+ options::OPT_fopenmp_cuda_teams_reduction_recs_num_EQ))
+ D.Diag(diag::warn_drv_deprecated_custom)
+ << A->getAsString(Args)
+ << "the value is ignored; the teams reduction buffer is sized "
+ "automatically at kernel launch";
Args.AddAllArgs(CmdArgs,
options::OPT_fopenmp_cuda_teams_reduction_recs_num_EQ);
if (Args.hasFlag(options::OPT_fopenmp_optimistic_collapse,
diff --git a/clang/lib/Frontend/CompilerInvocation.cpp b/clang/lib/Frontend/CompilerInvocation.cpp
index c6e8644905964..bf037550fc9e4 100644
--- a/clang/lib/Frontend/CompilerInvocation.cpp
+++ b/clang/lib/Frontend/CompilerInvocation.cpp
@@ -3891,7 +3891,7 @@ void CompilerInvocationBase::GenerateLangArgs(const LangOptions &Opts,
GenerateArg(Consumer, OPT_fopenmp_cuda_blocks_per_sm_EQ,
Twine(Opts.OpenMPCUDABlocksPerSM));
- if (Opts.OpenMPCUDAReductionBufNum != 1024)
+ if (Opts.OpenMPCUDAReductionBufNum)
GenerateArg(Consumer, OPT_fopenmp_cuda_teams_reduction_recs_num_EQ,
Twine(Opts.OpenMPCUDAReductionBufNum));
diff --git a/clang/test/Driver/openmp-offload-gpu.c b/clang/test/Driver/openmp-offload-gpu.c
index bf42ec7572b68..f6490452c73ef 100644
--- a/clang/test/Driver/openmp-offload-gpu.c
+++ b/clang/test/Driver/openmp-offload-gpu.c
@@ -193,6 +193,7 @@
// RUN: %clang -### -nogpulib -nogpuinc -fopenmp=libomp -fopenmp-targets=nvptx64-nvidia-cuda -Xopenmp-target -march=sm_60 %s -fopenmp-cuda-teams-reduction-recs-num=2048 2>&1 \
// RUN: | FileCheck -check-prefix=CUDA_RED_RECS %s
+// CUDA_RED_RECS: warning: argument '-fopenmp-cuda-teams-reduction-recs-num=2048' is deprecated, the value is ignored; the teams reduction buffer is sized automatically at kernel launch
// CUDA_RED_RECS: "-cc1"{{.*}}"-triple" "nvptx64-nvidia-cuda"
// CUDA_RED_RECS-SAME: "-fopenmp-cuda-teams-reduction-recs-num=2048"
diff --git a/clang/test/OpenMP/bug60602.cpp b/clang/test/OpenMP/bug60602.cpp
index e9174d7be3a12..8235a5a7d83d1 100644
--- a/clang/test/OpenMP/bug60602.cpp
+++ b/clang/test/OpenMP/bug60602.cpp
@@ -119,7 +119,7 @@ int kernel_within_loop(int *a, int *b, int N, int num_iters) {
// CHECK-NEXT: [[TMP35:%.*]] = getelementptr inbounds [6 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK-NEXT: [[TMP36:%.*]] = getelementptr inbounds [6 x i64], ptr [[DOTOFFLOAD_SIZES]], i32 0, i32 0
// CHECK-NEXT: [[TMP37:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK-NEXT: store i32 4, ptr [[TMP37]], align 4
+// CHECK-NEXT: store i32 5, ptr [[TMP37]], align 4
// CHECK-NEXT: [[TMP38:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK-NEXT: store i32 6, ptr [[TMP38]], align 4
// CHECK-NEXT: [[TMP39:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -223,7 +223,7 @@ int kernel_within_loop(int *a, int *b, int N, int num_iters) {
// CHECK-NEXT: [[ADD:%.*]] = add i32 [[TMP89]], 1
// CHECK-NEXT: [[TMP90:%.*]] = zext i32 [[ADD]] to i64
// CHECK-NEXT: [[TMP91:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 0
-// CHECK-NEXT: store i32 4, ptr [[TMP91]], align 4
+// CHECK-NEXT: store i32 5, ptr [[TMP91]], align 4
// CHECK-NEXT: [[TMP92:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 1
// CHECK-NEXT: store i32 6, ptr [[TMP92]], align 4
// CHECK-NEXT: [[TMP93:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS14]], i32 0, i32 2
diff --git a/clang/test/OpenMP/distribute_codegen.cpp b/clang/test/OpenMP/distribute_codegen.cpp
index 62b7ad8b979a2..afd18e91911dd 100644
--- a/clang/test/OpenMP/distribute_codegen.cpp
+++ b/clang/test/OpenMP/distribute_codegen.cpp
@@ -169,7 +169,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK1-NEXT: [[TMP19:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP20:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP21:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK1-NEXT: store i32 4, ptr [[TMP21]], align 4
+// CHECK1-NEXT: store i32 5, ptr [[TMP21]], align 4
// CHECK1-NEXT: [[TMP22:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK1-NEXT: store i32 5, ptr [[TMP22]], align 4
// CHECK1-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -368,7 +368,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK1-NEXT: [[TMP19:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP20:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP21:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK1-NEXT: store i32 4, ptr [[TMP21]], align 4
+// CHECK1-NEXT: store i32 5, ptr [[TMP21]], align 4
// CHECK1-NEXT: [[TMP22:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK1-NEXT: store i32 5, ptr [[TMP22]], align 4
// CHECK1-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -567,7 +567,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK1-NEXT: [[TMP19:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP20:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP21:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK1-NEXT: store i32 4, ptr [[TMP21]], align 4
+// CHECK1-NEXT: store i32 5, ptr [[TMP21]], align 4
// CHECK1-NEXT: [[TMP22:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK1-NEXT: store i32 5, ptr [[TMP22]], align 4
// CHECK1-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -774,7 +774,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK1-NEXT: [[ADD4:%.*]] = add nsw i32 [[TMP12]], 1
// CHECK1-NEXT: [[TMP13:%.*]] = zext i32 [[ADD4]] to i64
// CHECK1-NEXT: [[TMP14:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK1-NEXT: store i32 4, ptr [[TMP14]], align 4
+// CHECK1-NEXT: store i32 5, ptr [[TMP14]], align 4
// CHECK1-NEXT: [[TMP15:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK1-NEXT: store i32 2, ptr [[TMP15]], align 4
// CHECK1-NEXT: [[TMP16:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -949,7 +949,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK1-NEXT: [[TMP8:%.*]] = getelementptr inbounds [2 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP9:%.*]] = getelementptr inbounds [2 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK1-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK1-NEXT: store i32 4, ptr [[TMP10]], align 4
+// CHECK1-NEXT: store i32 5, ptr [[TMP10]], align 4
// CHECK1-NEXT: [[TMP11:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK1-NEXT: store i32 2, ptr [[TMP11]], align 4
// CHECK1-NEXT: [[TMP12:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -1130,7 +1130,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK3-NEXT: [[TMP19:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP20:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP21:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK3-NEXT: store i32 4, ptr [[TMP21]], align 4
+// CHECK3-NEXT: store i32 5, ptr [[TMP21]], align 4
// CHECK3-NEXT: [[TMP22:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK3-NEXT: store i32 5, ptr [[TMP22]], align 4
// CHECK3-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -1325,7 +1325,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK3-NEXT: [[TMP19:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP20:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP21:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK3-NEXT: store i32 4, ptr [[TMP21]], align 4
+// CHECK3-NEXT: store i32 5, ptr [[TMP21]], align 4
// CHECK3-NEXT: [[TMP22:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK3-NEXT: store i32 5, ptr [[TMP22]], align 4
// CHECK3-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -1520,7 +1520,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK3-NEXT: [[TMP19:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP20:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP21:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK3-NEXT: store i32 4, ptr [[TMP21]], align 4
+// CHECK3-NEXT: store i32 5, ptr [[TMP21]], align 4
// CHECK3-NEXT: [[TMP22:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK3-NEXT: store i32 5, ptr [[TMP22]], align 4
// CHECK3-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -1723,7 +1723,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK3-NEXT: [[ADD4:%.*]] = add nsw i32 [[TMP12]], 1
// CHECK3-NEXT: [[TMP13:%.*]] = zext i32 [[ADD4]] to i64
// CHECK3-NEXT: [[TMP14:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK3-NEXT: store i32 4, ptr [[TMP14]], align 4
+// CHECK3-NEXT: store i32 5, ptr [[TMP14]], align 4
// CHECK3-NEXT: [[TMP15:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK3-NEXT: store i32 2, ptr [[TMP15]], align 4
// CHECK3-NEXT: [[TMP16:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -1898,7 +1898,7 @@ int fint(void) { return ftemplate<int>(); }
// CHECK3-NEXT: [[TMP8:%.*]] = getelementptr inbounds [2 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP9:%.*]] = getelementptr inbounds [2 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK3-NEXT: [[TMP10:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK3-NEXT: store i32 4, ptr [[TMP10]], align 4
+// CHECK3-NEXT: store i32 5, ptr [[TMP10]], align 4
// CHECK3-NEXT: [[TMP11:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK3-NEXT: store i32 2, ptr [[TMP11]], align 4
// CHECK3-NEXT: [[TMP12:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
diff --git a/clang/test/OpenMP/distribute_firstprivate_codegen.cpp b/clang/test/OpenMP/distribute_firstprivate_codegen.cpp
index 019961381c0fc..d95623a597cbc 100644
--- a/clang/test/OpenMP/distribute_firstprivate_codegen.cpp
+++ b/clang/test/OpenMP/distribute_firstprivate_codegen.cpp
@@ -551,7 +551,7 @@ int main() {
// CHECK9-NEXT: [[TMP26:%.*]] = getelementptr inbounds [6 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK9-NEXT: [[TMP27:%.*]] = getelementptr inbounds [6 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK9-NEXT: [[TMP28:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK9-NEXT: store i32 4, ptr [[TMP28]], align 4
+// CHECK9-NEXT: store i32 5, ptr [[TMP28]], align 4
// CHECK9-NEXT: [[TMP29:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK9-NEXT: store i32 6, ptr [[TMP29]], align 4
// CHECK9-NEXT: [[TMP30:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -854,7 +854,7 @@ int main() {
// CHECK9-NEXT: [[TMP21:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK9-NEXT: [[TMP22:%.*]] = getelementptr inbounds [5 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK9-NEXT: [[TMP23:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK9-NEXT: store i32 4, ptr [[TMP23]], align 4
+// CHECK9-NEXT: store i32 5, ptr [[TMP23]], align 4
// CHECK9-NEXT: [[TMP24:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 1
// CHECK9-NEXT: store i32 5, ptr [[TMP24]], align 4
// CHECK9-NEXT: [[TMP25:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 2
@@ -1230,7 +1230,7 @@ int main() {
// CHECK11-NEXT: [[TMP26:%.*]] = getelementptr inbounds [6 x ptr], ptr [[DOTOFFLOAD_BASEPTRS]], i32 0, i32 0
// CHECK11-NEXT: [[TMP27:%.*]] = getelementptr inbounds [6 x ptr], ptr [[DOTOFFLOAD_PTRS]], i32 0, i32 0
// CHECK11-NEXT: [[TMP28:%.*]] = getelementptr inbounds nuw [[STRUCT___TGT_KERNEL_ARGUMENTS]], ptr [[KERNEL_ARGS]], i32 0, i32 0
-// CHECK11-NEXT: store i32 4, ptr [[TMP28]], align 4
+// CHECK11-NEXT: store i32 5, ptr [[TMP28]], align 4
// ...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/195102
More information about the Mlir-commits
mailing list