[llvm] [OpenMP] Fix num_iters in __kmpc_*_loop DeviceRTL functions (PR #133435)

Sergio Afonso via llvm-commits llvm-commits at lists.llvm.org
Fri Mar 28 05:31:08 PDT 2025


https://github.com/skatrak created https://github.com/llvm/llvm-project/pull/133435

This patch removes the addition of 1 to the number of iterations when calling the following DeviceRTL functions:
- `__kmpc_distribute_for_static_loop*`
- `__kmpc_distribute_static_loop*`
- `__kmpc_for_static_loop*`

Calls to these functions are currently only produced by the OMPIRBuilder from flang, which already passes the correct number of iterations to these functions. By adding 1 to the received `num_iters` variable, worksharing can produce incorrect results. This impacts flang OpenMP offloading of `do`, `distribute` and `distribute parallel do` constructs.

Expecting the application to pass `tripcount - 1` as the argument seems unexpected as well, so rather than updating flang I think it makes more sense to update the runtime.

>From a3b8aeade6245b9ffd5c65cfac487b13111dd3e3 Mon Sep 17 00:00:00 2001
From: Sergio Afonso <safonsof at amd.com>
Date: Fri, 28 Mar 2025 11:45:35 +0000
Subject: [PATCH] [OpenMP] Fix num_iters in __kmpc_*_loop DeviceRTL functions

This patch removes the addition of 1 to the number of iterations when calling
the following DeviceRTL functions:
- `__kmpc_distribute_for_static_loop*`
- `__kmpc_distribute_static_loop*`
- `__kmpc_for_static_loop*`

Calls to these functions are currently only produced by the OMPIRBuilder from
flang, which already passes the correct number of iterations to these
functions. By adding 1 to the received `num_iters` variable, worksharing
can produce incorrect results. This impacts flang OpenMP offloading for `do`,
`distribute` and `distribute parallel do` constructs.

Expecting the application to pass `tripcount - 1` as the argument seems
unexpected as well, so rather than updating flang I think it makes more sense
to update the runtime.
---
 offload/DeviceRTL/src/Workshare.cpp | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/offload/DeviceRTL/src/Workshare.cpp b/offload/DeviceRTL/src/Workshare.cpp
index 861b9ca371ccd..a8759307b42bd 100644
--- a/offload/DeviceRTL/src/Workshare.cpp
+++ b/offload/DeviceRTL/src/Workshare.cpp
@@ -911,19 +911,19 @@ template <typename Ty> class StaticLoopChunker {
           IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters,       \
           TY num_threads, TY block_chunk, TY thread_chunk) {                   \
     ompx::StaticLoopChunker<TY>::DistributeFor(                                \
-        loc, fn, arg, num_iters + 1, num_threads, block_chunk, thread_chunk);  \
+        loc, fn, arg, num_iters, num_threads, block_chunk, thread_chunk);      \
   }                                                                            \
   [[gnu::flatten, clang::always_inline]] void                                  \
       __kmpc_distribute_static_loop##BW(IdentTy *loc, void (*fn)(TY, void *),  \
                                         void *arg, TY num_iters,               \
                                         TY block_chunk) {                      \
-    ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters + 1,       \
+    ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters,           \
                                             block_chunk);                      \
   }                                                                            \
   [[gnu::flatten, clang::always_inline]] void __kmpc_for_static_loop##BW(      \
       IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters,           \
       TY num_threads, TY thread_chunk) {                                       \
-    ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters + 1, num_threads, \
+    ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters, num_threads,     \
                                      thread_chunk);                            \
   }
 



More information about the llvm-commits mailing list