[llvm] [OpenMP] Fix num_iters in __kmpc_*_loop DeviceRTL functions (PR #133435)
via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 28 05:31:52 PDT 2025
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-offload
Author: Sergio Afonso (skatrak)
<details>
<summary>Changes</summary>
This patch removes the addition of 1 to the number of iterations when calling the following DeviceRTL functions:
- `__kmpc_distribute_for_static_loop*`
- `__kmpc_distribute_static_loop*`
- `__kmpc_for_static_loop*`
Calls to these functions are currently only produced by the OMPIRBuilder from flang, which already passes the correct number of iterations to these functions. By adding 1 to the received `num_iters` variable, worksharing can produce incorrect results. This impacts flang OpenMP offloading of `do`, `distribute` and `distribute parallel do` constructs.
Expecting the application to pass `tripcount - 1` as the argument seems unexpected as well, so rather than updating flang I think it makes more sense to update the runtime.
---
Full diff: https://github.com/llvm/llvm-project/pull/133435.diff
1 Files Affected:
- (modified) offload/DeviceRTL/src/Workshare.cpp (+3-3)
``````````diff
diff --git a/offload/DeviceRTL/src/Workshare.cpp b/offload/DeviceRTL/src/Workshare.cpp
index 861b9ca371ccd..a8759307b42bd 100644
--- a/offload/DeviceRTL/src/Workshare.cpp
+++ b/offload/DeviceRTL/src/Workshare.cpp
@@ -911,19 +911,19 @@ template <typename Ty> class StaticLoopChunker {
IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters, \
TY num_threads, TY block_chunk, TY thread_chunk) { \
ompx::StaticLoopChunker<TY>::DistributeFor( \
- loc, fn, arg, num_iters + 1, num_threads, block_chunk, thread_chunk); \
+ loc, fn, arg, num_iters, num_threads, block_chunk, thread_chunk); \
} \
[[gnu::flatten, clang::always_inline]] void \
__kmpc_distribute_static_loop##BW(IdentTy *loc, void (*fn)(TY, void *), \
void *arg, TY num_iters, \
TY block_chunk) { \
- ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters + 1, \
+ ompx::StaticLoopChunker<TY>::Distribute(loc, fn, arg, num_iters, \
block_chunk); \
} \
[[gnu::flatten, clang::always_inline]] void __kmpc_for_static_loop##BW( \
IdentTy *loc, void (*fn)(TY, void *), void *arg, TY num_iters, \
TY num_threads, TY thread_chunk) { \
- ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters + 1, num_threads, \
+ ompx::StaticLoopChunker<TY>::For(loc, fn, arg, num_iters, num_threads, \
thread_chunk); \
}
``````````
</details>
https://github.com/llvm/llvm-project/pull/133435
More information about the llvm-commits
mailing list