[llvm-branch-commits] [openmp] [OpenMP][offload] Inline target reductions (PR #196061)

Wed May 6 08:22:08 PDT 2026

================
@@ -22,15 +22,19 @@ using namespace ompx;
 
 namespace {
 
-void gpu_regular_warp_reduce(void *reduce_data, ShuffleReductFnTy shflFct) {
+[[clang::always_inline]]
+static void gpu_regular_warp_reduce(void *reduce_data,
----------------
ro-i wrote:

> [...] I generally do think that needing to add always inline represents a problem with our codegen or inliner heuristics.

The issue is that we didn't want to have actual calls to the indirect functions passed to the reduction runtime.
Afaiu:
We don't restrict the specialization done by the attributor in OpenMPOpt. (Introduced by https://github.com/llvm/llvm-project/commit/9c08e76f3e5f2f3e8cb1e3c9fd45827395c712cc.)
We have each function such as `cpyFct`, `shflFct`, etc. once for every reduction in a translation unit. They are not shared because the data types could differ and they aren't cached or sth. Then, in the *not-inlined* case, code that uses these functions has all of their bodies inlined and then basically uses the function pointers in a switch to determine which of the inlined snippets it should execute.
All this is solved by inlining the reduction functions themselves, because then we have one instance per reduction, which means that there is no need to switch over the function pointers of the helper functions because there is now a 1:1 mapping between the helper functions and the functions that use them.

https://github.com/llvm/llvm-project/pull/196061