[Openmp-commits] [openmp] [libomp] Add reproducer for steal-after-finish race in proxy task OOO… (PR #187267)
Qian Cheng via Openmp-commits
openmp-commits at lists.llvm.org
Wed Mar 18 06:11:31 PDT 2026
https://github.com/Qian-Cheng-nju created https://github.com/llvm/llvm-project/pull/187267
Hi, I think there is a race condition in the proxy task OOO completion path and I'd like to share a reproducer.
`__kmpc_proxy_task_completed_ooo` enqueues the proxy's bottom-half into a worker deque, then decrements `td_incomplete_child_tasks` to 0. If all threads see ICC=0 and decrement `tt_unfinished_threads` to 0 before anyone picks up the proxy, the primary deactivates the task team while the proxy is still queued. This can lead to resource leaks and, on task team reuse, use-after-free from stale deque entries.
The steal loop in `execute_tasks_template` only checks `th_task_team == NULL` to exit, but only the primary clears that pointer — so workers don't notice the deactivation in time.
The race window is very narrow, so the reproducer includes a delay patch (`-DLIBOMP_REPRO_DELAY`) that injects three `usleep` calls to widen it (no logic changes):
1. Finished workers delay 10ms before each steal attempt — gives the primary time to deactivate before workers steal the proxy.
2. All threads delay 500μs after the inner loop finds no tasks — gives the external pthread time to enqueue the proxy and decrement ICC.
3. After deactivation, scan all deques and abort if any has tasks — this is the detector.
With the patch, the bug triggers 5/5 runs.
Happy to discuss further or provide additional details. Thanks for taking a look!
>From e6207336ebb12eb8badb96b8f2f715d99d874764 Mon Sep 17 00:00:00 2001
From: Qian-Cheng-nju <Qian-Cheng-nju at users.noreply.github.com>
Date: Wed, 18 Mar 2026 10:30:53 +0000
Subject: [PATCH] [libomp] Add reproducer for steal-after-finish race in proxy
task OOO path
---
.../test/tasking/bug_steal_after_finish.c | 117 ++++++++++++++++++
.../patches/steal-after-finish-delay.patch | 57 +++++++++
2 files changed, 174 insertions(+)
create mode 100644 openmp/runtime/test/tasking/bug_steal_after_finish.c
create mode 100644 openmp/runtime/test/tasking/patches/steal-after-finish-delay.patch
diff --git a/openmp/runtime/test/tasking/bug_steal_after_finish.c b/openmp/runtime/test/tasking/bug_steal_after_finish.c
new file mode 100644
index 0000000000000..ad5cf41eb7939
--- /dev/null
+++ b/openmp/runtime/test/tasking/bug_steal_after_finish.c
@@ -0,0 +1,117 @@
+/**
+ * Reproduction: steal-after-finish race via proxy task OOO completion
+ *
+ * The race: __kmpc_proxy_task_completed_ooo re-enqueues a proxy task into a
+ * deque and THEN decrements td_incomplete_child_tasks (ICC) to 0. If all
+ * threads mark finished (unfinished→0) before picking up the proxy, the
+ * primary deactivates the task team with a task still in a deque.
+ *
+ * Key: worker deques must be pre-allocated (otherwise __kmpc_give_task
+ * falls back to the primary's deque, and the primary picks it up).
+ *
+ * Requires libomp built with -DLIBOMP_REPRO_DELAY (delays in
+ * execute_tasks_template + barrier spin loop to widen race window).
+ */
+#include <omp.h>
+#include <pthread.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdatomic.h>
+#include <unistd.h>
+#include <signal.h>
+#include <string.h>
+
+#define NUM_THREADS 4
+#define NUM_TRIALS 500
+
+static omp_event_handle_t g_event;
+static atomic_int g_event_ready;
+
+static void *fulfiller_fn(void *arg) {
+ while (!atomic_load_explicit(&g_event_ready, memory_order_acquire))
+ ;
+ /* Wait for barrier entry + first inner loop so fulfiller fires
+ during the 500us delay window in execute_tasks_template. */
+ usleep(1000);
+ omp_fulfill_event(g_event);
+ return NULL;
+}
+
+static void crash_handler(int sig) {
+ const char *msg = "\n*** BUG REPRODUCED ***\n"
+ " Steal-after-finish: orphaned proxy task in deactivated task team\n\n";
+ write(STDERR_FILENO, msg, strlen(msg));
+ _exit(1);
+}
+
+int main(int argc, char *argv[]) {
+ int trials = NUM_TRIALS;
+ if (argc > 1)
+ trials = atoi(argv[1]);
+
+ signal(SIGSEGV, crash_handler);
+ signal(SIGBUS, crash_handler);
+ signal(SIGABRT, crash_handler);
+
+ omp_set_num_threads(NUM_THREADS);
+
+ printf("steal-after-finish race reproducer\n");
+ printf(" Threads: %d, Trials: %d\n", NUM_THREADS, trials);
+ printf(" Requires: libomp built with -DLIBOMP_REPRO_DELAY\n\n");
+
+ for (int trial = 0; trial < trials; trial++) {
+ atomic_store_explicit(&g_event_ready, 0, memory_order_relaxed);
+
+ pthread_t fulfiller;
+ pthread_create(&fulfiller, NULL, fulfiller_fn, NULL);
+
+ #pragma omp parallel num_threads(NUM_THREADS)
+ {
+ int tid = omp_get_thread_num();
+
+ /* Phase 1: Every thread creates a dummy task to force deque
+ allocation for all threads IN THE SAME task team that the
+ implicit barrier will use. NO explicit barrier here — an
+ explicit barrier toggles th_task_state, causing the implicit
+ barrier to use a DIFFERENT task team whose deques are NOT
+ allocated, so the proxy ends up in thread 0's own deque. */
+ volatile int dummy = 0;
+ #pragma omp task firstprivate(dummy)
+ {
+ dummy = 1; /* trivial work */
+ }
+
+ /* Phase 2: Thread 0 creates the detachable task.
+ Same task team as the dummy tasks → all worker deques exist →
+ __kmpc_give_task can enqueue proxy to a WORKER's deque. */
+ if (tid == 0) {
+ omp_event_handle_t evt;
+ #pragma omp task detach(evt)
+ {
+ g_event = evt;
+ atomic_store_explicit(&g_event_ready, 1,
+ memory_order_release);
+ /* Body returns WITHOUT fulfilling → proxy task.
+ External pthread fulfills via OOO path:
+ 1. Re-enqueues proxy to a worker's deque
+ 2. Decrements ICC → 0
+ With delays in libomp:
+ - 500us in execute_tasks_template (all threads)
+ - 1000us in barrier spin loop (workers only)
+ Primary deactivates while proxy sits in worker's deque. */
+ }
+ }
+ /* Implicit barrier at end of parallel region.
+ The detachable task is executed during this barrier's
+ task-execution phase. */
+ }
+
+ pthread_join(fulfiller, NULL);
+
+ if ((trial + 1) % 50 == 0)
+ printf(" [%d/%d] trials...\n", trial + 1, trials);
+ }
+
+ printf("\nCompleted %d trials.\n", trials);
+ return 0;
+}
diff --git a/openmp/runtime/test/tasking/patches/steal-after-finish-delay.patch b/openmp/runtime/test/tasking/patches/steal-after-finish-delay.patch
new file mode 100644
index 0000000000000..e61d0ccb4f09d
--- /dev/null
+++ b/openmp/runtime/test/tasking/patches/steal-after-finish-delay.patch
@@ -0,0 +1,57 @@
+diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
+--- a/openmp/runtime/src/kmp_tasking.cpp
++++ b/openmp/runtime/src/kmp_tasking.cpp
+@@ -3184,6 +3184,14 @@
+ while (1) { // Outer loop keeps trying to find tasks in case of single thread
+ // getting tasks from target constructs
+ while (1) { // Inner loop to find a task and execute it
++#ifdef LIBOMP_REPRO_DELAY
++ // BUG REPRODUCTION: Workers that already marked finished delay before
++ // checking deques. This prevents them from stealing the proxy task
++ // that was re-enqueued by the OOO path, giving the primary time to
++ // deactivate the task team with the proxy still in a deque.
++ if (final_spin && *thread_finished && !KMP_MASTER_TID(tid))
++ usleep(10000); // 10ms: worker already finished, prevent stealing
++#endif
+ #if ENABLE_LIBOMPTARGET
+ // Give an opportunity to the offload runtime to make progress
+ if (UNLIKELY(kmp_target_sync_cb))
+@@ -3366,6 +3374,13 @@
+ }
+ }
+
++#ifdef LIBOMP_REPRO_DELAY
++ // BUG REPRODUCTION: Widen the race window between "no tasks found"
++ // and the ICC check below. Gives the external proxy-task fulfiller
++ // time to re-enqueue the proxy and decrement ICC to 0.
++ if (final_spin)
++ usleep(500);
++#endif
++
+ // The task source has been exhausted. If in final spin loop of barrier,
+ // check if termination condition is satisfied. The work queue may be empty
+ // but there might be proxy tasks still executing.
+@@ -4077,6 +4092,22 @@
+ TCW_SYNC_4(task_team->tt.tt_active, FALSE);
+ KMP_MB();
+
+ TCW_PTR(this_thr->th.th_task_team, NULL);
++
++#ifdef LIBOMP_REPRO_DELAY
++ // SAFETY PROPERTY CHECK: after deactivation, no deque should have tasks.
++ if (task_team->tt.tt_threads_data) {
++ for (int i = 0; i < task_team->tt.tt_nproc; i++) {
++ int ntasks = TCR_4(task_team->tt.tt_threads_data[i].td.td_deque_ntasks);
++ if (ntasks > 0) {
++ fprintf(stderr,
++ "\n*** BUG REPRODUCED: orphaned task after deactivation ***\n"
++ " deque[%d] has %d task(s) in deactivated task team %p\n"
++ " This is the steal-after-finish race.\n\n",
++ i, ntasks, (void *)task_team);
++ abort();
++ }
++ }
++ }
++#endif
+ }
+ }
More information about the Openmp-commits
mailing list