[all-commits] [llvm/llvm-project] 2fdf19: [OpenMP] Fix crash with task stealing and task dep...

Julian Brown via All-commits all-commits at lists.llvm.org
Fri Feb 14 01:56:42 PST 2025


  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 2fdf191e244b62409fd73fa9bb717466d6e683b5
      https://github.com/llvm/llvm-project/commit/2fdf191e244b62409fd73fa9bb717466d6e683b5
  Author: Julian Brown <julian.brown at amd.com>
  Date:   2025-02-14 (Fri, 14 Feb 2025)

  Changed paths:
    M openmp/runtime/src/kmp_taskdeps.cpp
    M openmp/runtime/src/kmp_taskdeps.h

  Log Message:
  -----------
  [OpenMP] Fix crash with task stealing and task dependencies (#126049)

This patch series demonstrates and fixes a bug that causes crashes with
OpenMP 'taskwait' directives in heavily multi-threaded scenarios.

TLDR: The early return from __kmpc_omp_taskwait_deps_51 missed the
synchronization mechanism in place for the late return.

Additional debug assertions check for the implied invariants of the code.

@jpeyton52 found the timing hole as this sequence of events:
>
> 1. THREAD 1: A regular task with dependences is created, call it T1
> 2. THREAD 1: Call into `__kmpc_omp_taskwait_deps_51()` and create a stack
based depnode (`NULL` task), call it T2 (stack)
> 3. THREAD 2: Steals task T1 and executes it getting to
`__kmp_release_deps()` region.
> 4. THREAD 1: During processing of dependences for T2 (stack) (within
`__kmp_check_deps()` region),  a link is created T1 -> T2. This increases
T2's (stack) `nrefs` count.
> 5. THREAD 2: Iterates through the successors list: decrement the T2's
(stack) npredecessor count. BUT HASN'T YET `__kmp_node_deref()`-ed it.
> 6. THREAD 1: Now when finished with `__kmp_check_deps()`, it returns false
because npredecessor count is 0, but T2's (stack) `nrefs`  count is 2 because
THREAD 2 still references it!
> 7. THREAD 1: Because `__kmp_check_deps()` returns false, early exit.
>    _Now the stack based depnode is invalid, but THREAD 2 still references it._
>
> We've reached improper stack referencing behavior. Varied results/crashes/
asserts can occur if THREAD 1 comes back and recreates the exact same depnode
in the exact same stack address during the same time THREAD 2 calls
`__kmp_node_deref()`.



To unsubscribe from these emails, change your notification settings at https://github.com/llvm/llvm-project/settings/notifications


More information about the All-commits mailing list