[Openmp-commits] [openmp] [OpenMP][OMPT] Use global thread id for `codeptr_ra` in `end_critical` (PR #169826)
Jan André Reuter via Openmp-commits
openmp-commits at lists.llvm.org
Thu Nov 27 08:03:49 PST 2025
https://github.com/Thyre created https://github.com/llvm/llvm-project/pull/169826
When a critical construct has finished, it will trigger a critical-released event. If a tool is attached, and the `mutex_released` callback was registered, the tool with receive an event containing the `codeptr_ra`, the return address of the callback invocation.
All the way back in 82e94a593433f36734e2d34898d353a2ecb65b8b, this `codeptr_ra` was implemented by calling `__ompt_load_return_address` with a fixed global thread id of `0`. However, this approach results in a race-condition, and can yield incorrect results to the tool.
`__ompt_load_return_address(0)` points to the current return address of the thread 0 in `__kmp_threads`. This thread may already execute some other construct. A tool might therefore receive the return address of e.g. some `libomp` internals, or other parts of the user code. Additionally, a call to `__ompt_load_return_address` resets the `th.ompt_thread_info.return_address` to `NULL`, therefore also affecting the return address of thread 0. Another dispatched event, e.g. parallel-begin might therefore not transfer any `codeptr_ra`.
To fix this, replace the fixed thread id by the `global_tid`, which is stored just before dispatching the `mutex_released` callback.
-----
Context beyond the commit message:
<details>
<summary>Click to open</summary>
We ran into this issue while investigating a race-condition on our own in Score-P. After fixing our issue with nested parallelism and critical constructs, we've noticed that the resulting call trees were off.
A code like this:
```c
#include <unistd.h>
#include <omp.h>
int main( void )
{
#pragma omp parallel num_threads( 4 )
{
for( int i = 0; i < 1000000; ++i)
{
#pragma omp critical
{}
#pragma omp for nowait
for(int j = 0; j < 4; ++j)
{
#pragma omp critical
{}
}
}
}
}
```
was causing a broken call tree, e.g.:
```
a.out
+ main
| + !$omp parallel @test.c:6
| | + !$omp critical @test.c:10
| | | + !$omp critical sblock @test.c:10
| | + !$omp for/do @test.c:13
| | | + !$omp critical @test.c:16
| | | | + !$omp critical sblock @test.c:16
| | | + !$omp critical @0x7e6ef4022408
| | | | + !$omp critical sblock @0x7e6ef4022408
| | + !$omp critical @0x7e6ef4022408
| | | + !$omp critical sblock @0x7e6ef4022408
| | + !$omp implicit barrier @test.c:6
```
Where critical regions were not resolved by `addr2line`, or `for`/`parallel` constructs were missing the code position entirely. Looking at individual memory addresses closely in a debug build, I noticed that multiple threads were writing to `th.ompt_thread_info.return_address`, which looked incorrect to me.
It would be great to have a test for this, but I haven't been able to trigger this consistently, especially not with a light-weight tool. If anyone has an idea for this, please give suggestions.
</details>
>From 4693659e3a911360b3798a9bc50e94dad8fe7f71 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Jan=20Andr=C3=A9=20Reuter?= <j.reuter at fz-juelich.de>
Date: Thu, 27 Nov 2025 16:42:05 +0100
Subject: [PATCH] [OMPT] Use global thread id for `codeptr_ra` in
`end_critical`
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When a critical construct has finished, it will trigger a critical-released
event. If a tool is attached, and the `mutex_released` callback was
registered, the tool with receive an event containing the `codeptr_ra`, the
return address of the callback invocation.
All the way back in 82e94a593433f36734e2d34898d353a2ecb65b8b, this
`codeptr_ra` was implemented by calling `__ompt_load_return_address`
with a fixed global thread id of `0`. However, this approach results in a
race-condition, and can yield incorrect results to the tool.
`__ompt_load_return_address(0)` points to the current return address of
the thread 0 in `__kmp_threads`. This thread may already execute some
other construct. A tool might therefore receive the return address of
e.g. some `libomp` internals, or other parts of the user code.
Additionally, a call to `__ompt_load_return_address` resets the
`th.ompt_thread_info.return_address` to `NULL`, therefore also affecting
the return address of thread 0. Another dispatched event, e.g.
parallel-begin might therefore not transfer any `codeptr_ra`.
To fix this, replace the fixed thread id by the `global_tid`, which is
stored just before dispatching the `mutex_released` callback.
Signed-off-by: Jan André Reuter <j.reuter at fz-juelich.de>
---
openmp/runtime/src/kmp_csupport.cpp | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/openmp/runtime/src/kmp_csupport.cpp b/openmp/runtime/src/kmp_csupport.cpp
index 3ca32ba583fe2..a92fc46374c27 100644
--- a/openmp/runtime/src/kmp_csupport.cpp
+++ b/openmp/runtime/src/kmp_csupport.cpp
@@ -1780,7 +1780,7 @@ void __kmpc_end_critical(ident_t *loc, kmp_int32 global_tid,
if (ompt_enabled.ompt_callback_mutex_released) {
ompt_callbacks.ompt_callback(ompt_callback_mutex_released)(
ompt_mutex_critical, (ompt_wait_id_t)(uintptr_t)lck,
- OMPT_LOAD_RETURN_ADDRESS(0));
+ OMPT_LOAD_RETURN_ADDRESS(global_tid));
}
#endif
More information about the Openmp-commits
mailing list