[clang] [llvm] [openmp] [OpenMP] OpenMP 6.0 taskgraph support (WIP) (PR #188765)
Julian Brown via cfe-commits
cfe-commits at lists.llvm.org
Thu Mar 26 08:05:49 PDT 2026
https://github.com/jtb20 created https://github.com/llvm/llvm-project/pull/188765
This patch series outlines a new approach to implementing OpenMP 6.0 "taskgraph" support in LLVM. Key differences to the previously-posted "record and replay" implementation are as follows.
- The task/taskdata structures and the dependencies between them are duplicated whilst recording a taskgraph, keeping the existing runtime dependency handling unaffected by the taskgraph implementation -- e.g. during runtime execution, it is valid for output dependencies can be dropped as soon as the producing task completes. This separation is intended to eliminate a class of race conditions, where tasks which complete unpredictably might or might not be marked as depending on a subsequent task.
- For two reasons, a new set of entry points is used for recording tasks within a taskgraph. The first reason is to pass extra information from the compiler relevant to the taskgraph-recording case -- e.g. the __kmpc_taskgraph_task entry point has extra arguments relating to shared data. The second reason is to reduce the potential overhead of the taskgraph implementation on the rest of the runtime. (1)
- The dependencies between tasks in a taskgraph are processed by static analysis: the high-level process is akin to turning data dependencies between tasks into control-flow dependencies. This is done by building a set of successors and predecessors for each recorded task, then decomposing the resulting DAG into parallel and sequential regions. In the (presumed relatively unlikely, in real-world code) case that the graph is irreducible, a further set of analyses and transformations is done, and the parallel-sequential decomposition is run again. (2)
The output of this process is a set of nested kmp_taskgraph_region structures -- parallel or sequential (with some number of children), or nodes representing a single task. The two phases alternate until we obtain a single, top-level region.
- Replaying a taskgraph processed in this way on the CPU involves another set of linked structures, of type kmp_taskgraph_exec_descr. These form a kind of trace of a traversal over the kmp_taskgraph_region structure, so that a pointer to a kmp_taskgraph_exec_descr is somewhat equivalent to a "program counter". (3)
- Recorded taskgraphs are now located directly by using a handle passed in from the user's compiled program, rather than using a linked list or hashtable to find taskgraph records to replay keyed by an index.
(1) A third intention is to capture OpenMP semantics at a slightly higher level: in particular when we come to add offload target tasks to this implementation, those will also use new API entry points to hopefully allow dependencies to be handled entirely on the GPU, rather than by being wrapped in a host task.
(2) This process will take some time, but I have made some effort to make it efficient. E.g. unnecessary allocations and deallocations are kept to a minimum by recycling kmp_taskgraph_region_dep_t structures (which are always the same size), or by allocating kmp_depnode_t all together in a single block (in __kmp_build_taskgraph).
(3) The intention is that GPU/offload execution will take the nested kmp_taskgraph_region structure (potentially containing intermixed target tasks and host tasks) and map it in some way appropriate for a GPU graph-execution API or (a suitably extended) liboffload GPU backend.
There is also an implementation of the "replayable" clause, but not (yet) the "saved" modifier.
Some care has been taken around the handling of implicit taskgroups: in particular, a taskgraph can contain two back-to-back taskloops, each of which has a reduction:
#pragma omp taskgraph
{
#pragma omp taskloop reduction(+: var1)
{ var1 += ...; }
#pragma omp taskloop reduction(+: var2)
{ var2 += var1; }
}
This seems legal, but means that we still need to create and destroy taskgroup structures if we have reductions on a taskloop within a replayed taskgraph so that the first reduction result can be used safely in the second loop. (Or perhaps we could retain/reuse the taskgroup structures: that's not done yet.)
The patch series goes some way towards full thread safety, but there are still problems to be addressed. In principle we could perhaps have a kmp_taskgraph_exec_descr set per-thread, pointing back to a shared kmp_taskgraph_record/kmp_taskgraph_region structure, but in practice we'd probably need to duplicate the underlying task/taskdata structures too. The global shared state is all gone now.
(This patch series builds on top of previous work, which is also included in the commit series. The bulk of the new work is in the top four patches.)
>From d84827fd479a53871f32efa30d963daff27a42d3 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Tue, 16 Sep 2025 04:16:15 -0500
Subject: [PATCH 01/28] [OpenMP] Make loop index unsigned in
__kmpc_omp_task_with_deps/__kmp_omp_task
NFC.
Co-authored-by: Adrian Munera <adrian.munera at bsc.es>
---
openmp/runtime/src/kmp_taskdeps.cpp | 2 +-
openmp/runtime/src/kmp_tasking.cpp | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/openmp/runtime/src/kmp_taskdeps.cpp b/openmp/runtime/src/kmp_taskdeps.cpp
index abbca752f0587..743d8ed093c61 100644
--- a/openmp/runtime/src/kmp_taskdeps.cpp
+++ b/openmp/runtime/src/kmp_taskdeps.cpp
@@ -714,7 +714,7 @@ kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,
__kmp_free(old_record);
- for (kmp_int i = old_size; i < new_size; i++) {
+ for (kmp_uint i = old_size; i < new_size; i++) {
kmp_int32 *successorsList = (kmp_int32 *)__kmp_allocate(
__kmp_successors_size * sizeof(kmp_int32));
new_record[i].task = nullptr;
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index ae2d617c3ea40..be1b06c6a86b8 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1828,7 +1828,7 @@ kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
__kmp_free(old_record);
- for (kmp_int i = old_size; i < new_size; i++) {
+ for (kmp_uint i = old_size; i < new_size; i++) {
kmp_int32 *successorsList = (kmp_int32 *)__kmp_allocate(
__kmp_successors_size * sizeof(kmp_int32));
new_record[i].task = nullptr;
>From 3fbedbccf35f2eee35710d078c3f1c1021b104cf Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Mon, 15 Sep 2025 05:13:20 -0500
Subject: [PATCH 02/28] [OpenMP] Use ID not index to identify taskgraphs in
libomp runtime
In preparation for the following patches, this patch changes the key
used to identify taskgraphs from a monotonic index into an ID (stored
in a linear table).
Co-authored-by: Adrian Munera <adrian.munera at bsc.es>
---
openmp/runtime/src/kmp.h | 5 ++-
openmp/runtime/src/kmp_global.cpp | 2 +-
openmp/runtime/src/kmp_tasking.cpp | 72 ++++++++++++++++++++----------
3 files changed, 54 insertions(+), 25 deletions(-)
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 19deaef75415d..e2db4a69ba15c 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2682,7 +2682,7 @@ typedef struct kmp_tdg_info {
extern int __kmp_tdg_dot;
extern kmp_int32 __kmp_max_tdgs;
extern kmp_tdg_info_t **__kmp_global_tdgs;
-extern kmp_int32 __kmp_curr_tdg_idx;
+extern kmp_int32 __kmp_curr_tdg_id;
extern kmp_int32 __kmp_successors_size;
extern std::atomic<kmp_int32> __kmp_tdg_task_id;
extern kmp_int32 __kmp_num_tdg;
@@ -4398,6 +4398,9 @@ KMP_EXPORT kmp_int32 __kmpc_start_record_task(ident_t *loc, kmp_int32 gtid,
kmp_int32 tdg_id);
KMP_EXPORT void __kmpc_end_record_task(ident_t *loc, kmp_int32 gtid,
kmp_int32 input_flags, kmp_int32 tdg_id);
+KMP_EXPORT void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
+ kmp_int32 input_flags, kmp_uint32 tdg_id,
+ void (*entry)(void *), void *args);
#endif
/* Interface to fast scalable reduce methods routines */
diff --git a/openmp/runtime/src/kmp_global.cpp b/openmp/runtime/src/kmp_global.cpp
index c6fdcf824af92..9d3de25c8dcac 100644
--- a/openmp/runtime/src/kmp_global.cpp
+++ b/openmp/runtime/src/kmp_global.cpp
@@ -558,7 +558,7 @@ int *__kmp_nesting_nth_level;
int __kmp_tdg_dot = 0;
kmp_int32 __kmp_max_tdgs = 100;
kmp_tdg_info_t **__kmp_global_tdgs = NULL;
-kmp_int32 __kmp_curr_tdg_idx =
+kmp_int32 __kmp_curr_tdg_id =
0; // Id of the current TDG being recorded or executed
kmp_int32 __kmp_num_tdg = 0;
kmp_int32 __kmp_successors_size = 10; // Initial succesor size list for
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index be1b06c6a86b8..4e93cbd45c1bb 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1431,11 +1431,11 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
}
#if OMPX_TASKGRAPH
- kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_idx);
+ kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status) &&
(task_entry != (kmp_routine_entry_t)__kmp_taskloop_task)) {
taskdata->is_taskgraph = 1;
- taskdata->tdg = __kmp_global_tdgs[__kmp_curr_tdg_idx];
+ taskdata->tdg = tdg;
taskdata->td_task_id = KMP_GEN_TASK_ID();
taskdata->td_tdg_task_id = KMP_ATOMIC_INC(&__kmp_tdg_task_id);
}
@@ -2377,9 +2377,9 @@ without help of the runtime library.
*/
void *__kmpc_task_reduction_init(int gtid, int num, void *data) {
#if OMPX_TASKGRAPH
- kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_idx);
+ kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
- kmp_tdg_info_t *this_tdg = __kmp_global_tdgs[__kmp_curr_tdg_idx];
+ kmp_tdg_info_t *this_tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
this_tdg->rec_taskred_data =
__kmp_allocate(sizeof(kmp_task_red_input_t) * num);
this_tdg->rec_num_taskred = num;
@@ -2404,14 +2404,11 @@ has two parameters, pointer to object to be initialized and pointer to omp_orig
*/
void *__kmpc_taskred_init(int gtid, int num, void *data) {
#if OMPX_TASKGRAPH
- kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_idx);
+ kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
- kmp_tdg_info_t *this_tdg = __kmp_global_tdgs[__kmp_curr_tdg_idx];
- this_tdg->rec_taskred_data =
- __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
- this_tdg->rec_num_taskred = num;
- KMP_MEMCPY(this_tdg->rec_taskred_data, data,
- sizeof(kmp_task_red_input_t) * num);
+ tdg->rec_taskred_data = __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
+ tdg->rec_num_taskred = num;
+ KMP_MEMCPY(tdg->rec_taskred_data, data, sizeof(kmp_task_red_input_t) * num);
}
#endif
return __kmp_task_reduction_init(gtid, num, (kmp_taskred_input_t *)data);
@@ -2463,7 +2460,7 @@ void *__kmpc_task_reduction_get_th_data(int gtid, void *tskgrp, void *data) {
#if OMPX_TASKGRAPH
if ((thread->th.th_current_task->is_taskgraph) &&
(!__kmp_tdg_is_recording(
- __kmp_global_tdgs[__kmp_curr_tdg_idx]->tdg_status))) {
+ __kmp_find_tdg(__kmp_curr_tdg_id)->tdg_status))) {
tg = thread->th.th_current_task->td_taskgroup;
KMP_ASSERT(tg != NULL);
KMP_ASSERT(tg->reduce_data != NULL);
@@ -5244,6 +5241,24 @@ bool __kmpc_omp_has_task_team(kmp_int32 gtid) {
}
#if OMPX_TASKGRAPH
+// __kmpc_taskgraph: record or replay taskgraph
+// loc_ref: Location of TDG, not used yet
+// gtid: Global Thread ID of the encountering thread
+// input_flags: Flags associated with the TDG
+// tdg_id: ID of the TDG to record, for now, incremental integer
+// entry: Pointer to the entry function
+// args: Pointer to the function arguments
+void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid, kmp_int32 input_flags,
+ kmp_uint32 tdg_id, void (*entry)(void *), void *args) {
+ kmp_int32 res = __kmpc_start_record_task(loc_ref, gtid, input_flags, tdg_id);
+ // When res = 1, we either start recording or only execute tasks
+ // without recording. Need to execute entry function in both cases.
+ if (res)
+ entry(args);
+
+ __kmpc_end_record_task(loc_ref, gtid, input_flags, tdg_id);
+}
+
// __kmp_find_tdg: identify a TDG through its ID
// tdg_id: ID of the TDG
// returns: If a TDG corresponding to this ID is found and not
@@ -5257,9 +5272,14 @@ static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id) {
__kmp_global_tdgs = (kmp_tdg_info_t **)__kmp_allocate(
sizeof(kmp_tdg_info_t *) * __kmp_max_tdgs);
- if ((__kmp_global_tdgs[tdg_id]) &&
- (__kmp_global_tdgs[tdg_id]->tdg_status != KMP_TDG_NONE))
- res = __kmp_global_tdgs[tdg_id];
+ for (kmp_int32 i = 0; i < __kmp_num_tdg; ++i) {
+ if ((__kmp_global_tdgs[i]) && (__kmp_global_tdgs[i]->tdg_id == tdg_id) &&
+ (__kmp_global_tdgs[i]->tdg_status != KMP_TDG_NONE)) {
+ res = __kmp_global_tdgs[i];
+ __kmp_curr_tdg_id = tdg_id;
+ break;
+ }
+ }
return res;
}
@@ -5268,7 +5288,8 @@ static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id) {
// gtid: Global Thread ID
void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg, kmp_int32 gtid) {
kmp_int32 tdg_id = tdg->tdg_id;
- KA_TRACE(10, ("__kmp_print_tdg_dot(enter): T#%d tdg_id=%d \n", gtid, tdg_id));
+ KA_TRACE(10, ("__kmp_print_tdg_dot(enter): T#%d tdg_id=%d \n",
+ __kmp_get_gtid(), tdg_id));
char file_name[20];
sprintf(file_name, "tdg_%d.dot", tdg_id);
@@ -5294,7 +5315,8 @@ void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg, kmp_int32 gtid) {
}
}
fprintf(tdg_file, "}");
- KA_TRACE(10, ("__kmp_print_tdg_dot(exit): T#%d tdg_id=%d \n", gtid, tdg_id));
+ KA_TRACE(10, ("__kmp_print_tdg_dot(exit): T#%d tdg_id=%d \n",
+ __kmp_get_gtid(), tdg_id));
}
// __kmp_exec_tdg: launch the execution of a previous
@@ -5359,7 +5381,7 @@ static inline void __kmp_start_record(kmp_int32 gtid,
kmp_int32 tdg_id) {
kmp_tdg_info_t *tdg =
(kmp_tdg_info_t *)__kmp_allocate(sizeof(kmp_tdg_info_t));
- __kmp_global_tdgs[__kmp_curr_tdg_idx] = tdg;
+ __kmp_global_tdgs[__kmp_num_tdg - 1] = tdg;
// Initializing the TDG structure
tdg->tdg_id = tdg_id;
tdg->map_size = INIT_MAPSIZE;
@@ -5384,7 +5406,7 @@ static inline void __kmp_start_record(kmp_int32 gtid,
KMP_ATOMIC_ST_RLX(&this_record_map[i].npredecessors_counter, 0);
}
- __kmp_global_tdgs[__kmp_curr_tdg_idx]->record_map = this_record_map;
+ tdg->record_map = this_record_map;
}
// __kmpc_start_record_task: Wrapper around __kmp_start_record to mark
@@ -5418,10 +5440,14 @@ kmp_int32 __kmpc_start_record_task(ident_t *loc_ref, kmp_int32 gtid,
__kmp_exec_tdg(gtid, tdg);
res = 0;
} else {
- __kmp_curr_tdg_idx = tdg_id;
- KMP_DEBUG_ASSERT(__kmp_curr_tdg_idx < __kmp_max_tdgs);
- __kmp_start_record(gtid, flags, tdg_id);
- __kmp_num_tdg++;
+ if (__kmp_num_tdg < __kmp_max_tdgs) {
+ __kmp_curr_tdg_id = tdg_id;
+ __kmp_num_tdg++;
+ KMP_DEBUG_ASSERT(__kmp_num_tdg <= __kmp_max_tdgs);
+ __kmp_start_record(gtid, flags, tdg_id);
+ }
+ // if no TDG found, need to execute the task
+ // even not recording
res = 1;
}
KA_TRACE(10, ("__kmpc_start_record_task(exit): T#%d TDG %d starts to %s\n",
>From 1b632eec95cd6411ca206b3012e51779674e5884 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Mon, 15 Sep 2025 05:55:01 -0500
Subject: [PATCH 03/28] [OpenMP] Rename
ompx_taskgraph->omp_taskgraph_experimental
This patch renames the option to enable taskgraph support in the
runtime from OMPX_TASKGRAPH to OMP_TASKGRAPH_EXPERIMENTAL, to reflect
the feature's official status in OpenMP 6.0, but also the feature's
current work-in-progress nature.
---
openmp/runtime/CMakeLists.txt | 6 +-
openmp/runtime/src/kmp.h | 10 ++--
openmp/runtime/src/kmp_config.h.cmake | 4 +-
openmp/runtime/src/kmp_global.cpp | 2 +-
openmp/runtime/src/kmp_settings.cpp | 4 +-
openmp/runtime/src/kmp_taskdeps.cpp | 14 ++---
openmp/runtime/src/kmp_taskdeps.h | 4 +-
openmp/runtime/src/kmp_tasking.cpp | 55 ++++++++++---------
openmp/runtime/test/CMakeLists.txt | 2 +-
openmp/runtime/test/lit.cfg | 4 +-
openmp/runtime/test/lit.site.cfg.in | 2 +-
.../test/tasking/omp_record_replay.cpp | 2 +-
.../test/tasking/omp_record_replay_deps.cpp | 2 +-
.../omp_record_replay_deps_multi_succ.cpp | 2 +-
.../tasking/omp_record_replay_multiTDGs.cpp | 2 +-
.../tasking/omp_record_replay_print_dot.cpp | 2 +-
.../tasking/omp_record_replay_taskloop.cpp | 2 +-
17 files changed, 60 insertions(+), 59 deletions(-)
diff --git a/openmp/runtime/CMakeLists.txt b/openmp/runtime/CMakeLists.txt
index 39a969731661a..3a5332a42cde9 100644
--- a/openmp/runtime/CMakeLists.txt
+++ b/openmp/runtime/CMakeLists.txt
@@ -373,9 +373,9 @@ if(LIBOMP_OMPD_SUPPORT AND ((NOT LIBOMP_OMPT_SUPPORT) OR (NOT "${CMAKE_SYSTEM_NA
set(LIBOMP_OMPD_SUPPORT FALSE)
endif()
-# OMPX Taskgraph support
-# Whether to build with OMPX Taskgraph (e.g. task record & replay)
-set(LIBOMP_OMPX_TASKGRAPH FALSE CACHE BOOL "OMPX-taskgraph (task record & replay)?")
+# OMP Taskgraph support
+# Whether to build with OMP Taskgraph (e.g. task record & replay)
+set(LIBOMP_TASKGRAPH_EXPERIMENTAL FALSE CACHE BOOL "Experimental OMP taskgraph (task record & replay)")
# Error check hwloc support after config-ix has run
if(LIBOMP_USE_HWLOC AND (NOT LIBOMP_HAVE_HWLOC))
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index e2db4a69ba15c..b17dbeadc7bdd 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2631,7 +2631,7 @@ typedef struct {
} ed;
} kmp_event_t;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
// Initial number of allocated nodes while recording
#define INIT_MAPSIZE 50
@@ -2691,7 +2691,7 @@ extern kmp_int32 __kmp_num_tdg;
typedef struct kmp_tasking_flags { /* Total struct must be exactly 32 bits */
#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
/* Same fields as in the #else branch, but in reverse order */
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
unsigned reserved31 : 4;
unsigned onced : 1;
#else
@@ -2752,7 +2752,7 @@ typedef struct kmp_tasking_flags { /* Total struct must be exactly 32 bits */
unsigned native : 1; /* 1==gcc-compiled task, 0==intel */
unsigned target : 1;
unsigned hidden_helper : 1; /* 1 == hidden helper task */
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
unsigned onced : 1; /* 1==ran once already, 0==never ran, record & replay purposes */
unsigned reserved31 : 4; /* reserved for library use */
#else
@@ -2807,7 +2807,7 @@ struct kmp_taskdata { /* aligned during dynamic allocation */
#if OMPT_SUPPORT
ompt_task_info_t ompt_task_info;
#endif
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
bool is_taskgraph = 0; // whether the task is within a TDG
kmp_tdg_info_t *tdg; // used to associate task with a TDG
kmp_int32 td_tdg_task_id; // local task id in its TDG
@@ -4385,7 +4385,7 @@ KMP_EXPORT void __kmpc_init_nest_lock_with_hint(ident_t *loc, kmp_int32 gtid,
void **user_lock,
uintptr_t hint);
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
// Taskgraph's Record & Replay mechanism
// __kmp_tdg_is_recording: check whether a given TDG is recording
// status: the tdg's current status
diff --git a/openmp/runtime/src/kmp_config.h.cmake b/openmp/runtime/src/kmp_config.h.cmake
index 40f1087fd7f27..1f966008c60a5 100644
--- a/openmp/runtime/src/kmp_config.h.cmake
+++ b/openmp/runtime/src/kmp_config.h.cmake
@@ -46,8 +46,8 @@
#define OMPT_SUPPORT LIBOMP_OMPT_SUPPORT
#cmakedefine01 LIBOMP_OMPD_SUPPORT
#define OMPD_SUPPORT LIBOMP_OMPD_SUPPORT
-#cmakedefine01 LIBOMP_OMPX_TASKGRAPH
-#define OMPX_TASKGRAPH LIBOMP_OMPX_TASKGRAPH
+#cmakedefine01 LIBOMP_TASKGRAPH_EXPERIMENTAL
+#define OMP_TASKGRAPH_EXPERIMENTAL LIBOMP_TASKGRAPH_EXPERIMENTAL
#cmakedefine01 LIBOMP_PROFILING_SUPPORT
#define OMP_PROFILING_SUPPORT LIBOMP_PROFILING_SUPPORT
#cmakedefine01 LIBOMP_OMPT_OPTIONAL
diff --git a/openmp/runtime/src/kmp_global.cpp b/openmp/runtime/src/kmp_global.cpp
index 9d3de25c8dcac..c5c9a32fd0812 100644
--- a/openmp/runtime/src/kmp_global.cpp
+++ b/openmp/runtime/src/kmp_global.cpp
@@ -553,7 +553,7 @@ int __kmp_nesting_mode = 0;
int __kmp_nesting_mode_nlevels = 1;
int *__kmp_nesting_nth_level;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
// TDG record & replay
int __kmp_tdg_dot = 0;
kmp_int32 __kmp_max_tdgs = 100;
diff --git a/openmp/runtime/src/kmp_settings.cpp b/openmp/runtime/src/kmp_settings.cpp
index b6e7e9cadfe60..66ef6f8097dce 100644
--- a/openmp/runtime/src/kmp_settings.cpp
+++ b/openmp/runtime/src/kmp_settings.cpp
@@ -1266,7 +1266,7 @@ static void __kmp_stg_parse_num_threads(char const *name, char const *value,
K_DIAG(1, ("__kmp_dflt_team_nth == %d\n", __kmp_dflt_team_nth));
} // __kmp_stg_parse_num_threads
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
static void __kmp_stg_parse_max_tdgs(char const *name, char const *value,
void *data) {
__kmp_stg_parse_int(name, value, 0, INT_MAX, &__kmp_max_tdgs);
@@ -5742,7 +5742,7 @@ static kmp_setting_t __kmp_stg_table[] = {
{"LIBOMP_NUM_HIDDEN_HELPER_THREADS",
__kmp_stg_parse_num_hidden_helper_threads,
__kmp_stg_print_num_hidden_helper_threads, NULL, 0, 0},
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
{"KMP_MAX_TDGS", __kmp_stg_parse_max_tdgs, __kmp_std_print_max_tdgs, NULL,
0, 0},
{"KMP_TDG_DOT", __kmp_stg_parse_tdg_dot, __kmp_stg_print_tdg_dot, NULL, 0,
diff --git a/openmp/runtime/src/kmp_taskdeps.cpp b/openmp/runtime/src/kmp_taskdeps.cpp
index 743d8ed093c61..b1a0848fc722f 100644
--- a/openmp/runtime/src/kmp_taskdeps.cpp
+++ b/openmp/runtime/src/kmp_taskdeps.cpp
@@ -222,7 +222,7 @@ static kmp_depnode_list_t *__kmp_add_node(kmp_info_t *thread,
static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,
kmp_depnode_t *sink,
kmp_task_t *sink_task) {
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_taskdata_t *task_source = KMP_TASK_TO_TASKDATA(source->dn.task);
kmp_taskdata_t *task_sink = KMP_TASK_TO_TASKDATA(sink_task);
if (source->dn.task && sink_task) {
@@ -311,7 +311,7 @@ __kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,
// link node as successor of list elements
for (kmp_depnode_list_t *p = plist; p; p = p->next) {
kmp_depnode_t *dep = p->node;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_tdg_status tdg_status = KMP_TDG_NONE;
if (task) {
kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(task);
@@ -325,7 +325,7 @@ __kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,
KMP_ACQUIRE_DEPNODE(gtid, dep);
if (dep->dn.task) {
if (!dep->dn.successors || dep->dn.successors->node != node) {
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (!(__kmp_tdg_is_recording(tdg_status)) && task)
#endif
__kmp_track_dependence(gtid, dep, node, task);
@@ -352,7 +352,7 @@ static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
if (!sink)
return 0;
kmp_int32 npredecessors = 0;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_tdg_status tdg_status = KMP_TDG_NONE;
kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(task);
if (task) {
@@ -367,7 +367,7 @@ static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
KMP_ACQUIRE_DEPNODE(gtid, sink);
if (sink->dn.task) {
if (!sink->dn.successors || sink->dn.successors->node != source) {
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (!(__kmp_tdg_is_recording(tdg_status)) && task)
#endif
__kmp_track_dependence(gtid, sink, source, task);
@@ -376,7 +376,7 @@ static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
"%p\n",
gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),
KMP_TASK_TO_TASKDATA(task)));
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (__kmp_tdg_is_recording(tdg_status)) {
kmp_taskdata_t *tdd = KMP_TASK_TO_TASKDATA(sink->dn.task);
if (tdd->is_taskgraph) {
@@ -694,7 +694,7 @@ kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,
kmp_info_t *thread = __kmp_threads[gtid];
kmp_taskdata_t *current_task = thread->th.th_current_task;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
// record TDG with deps
if (new_taskdata->is_taskgraph &&
__kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
diff --git a/openmp/runtime/src/kmp_taskdeps.h b/openmp/runtime/src/kmp_taskdeps.h
index f6bfb39218a21..0792baf67f162 100644
--- a/openmp/runtime/src/kmp_taskdeps.h
+++ b/openmp/runtime/src/kmp_taskdeps.h
@@ -96,7 +96,7 @@ extern void __kmpc_give_task(kmp_task_t *ptask, kmp_int32 start);
static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (task->is_taskgraph && !(__kmp_tdg_is_recording(task->tdg->tdg_status))) {
kmp_node_info_t *TaskInfo = &(task->tdg->record_map[task->td_tdg_task_id]);
@@ -140,7 +140,7 @@ static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {
gtid, task));
KMP_ACQUIRE_DEPNODE(gtid, node);
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (!task->is_taskgraph ||
(task->is_taskgraph && !__kmp_tdg_is_recording(task->tdg->tdg_status)))
#endif
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index 4e93cbd45c1bb..660b5f9e3373f 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -37,7 +37,7 @@ static void __kmp_alloc_task_deque(kmp_info_t *thread,
static int __kmp_realloc_task_threads_data(kmp_info_t *thread,
kmp_task_team_t *task_team);
static void __kmp_bottom_half_finish_proxy(kmp_int32 gtid, kmp_task_t *ptask);
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id);
int __kmp_taskloop_task(int gtid, void *ptask);
#endif
@@ -70,7 +70,7 @@ static bool __kmp_task_is_allowed(int gtid, const kmp_int32 is_constrained,
}
// Check mutexinoutset dependencies, acquire locks
kmp_depnode_t *node = tasknew->td_depnode;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (!tasknew->is_taskgraph && UNLIKELY(node && (node->dn.mtx_num_locks > 0))) {
#else
if (UNLIKELY(node && (node->dn.mtx_num_locks > 0))) {
@@ -665,7 +665,7 @@ static void __kmp_free_task(kmp_int32 gtid, kmp_taskdata_t *taskdata,
task->data2.priority = 0;
taskdata->td_flags.freed = 1;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
// do not free tasks in taskgraph
if (!taskdata->is_taskgraph) {
#endif
@@ -675,7 +675,7 @@ static void __kmp_free_task(kmp_int32 gtid, kmp_taskdata_t *taskdata,
#else /* ! USE_FAST_MEMORY */
__kmp_thread_free(thread, taskdata);
#endif
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
} else {
taskdata->td_flags.complete = 0;
taskdata->td_flags.started = 0;
@@ -779,7 +779,7 @@ static bool __kmp_track_children_task(kmp_taskdata_t *taskdata) {
flags.detachable == TASK_DETACHABLE || flags.hidden_helper;
ret = ret ||
KMP_ATOMIC_LD_ACQ(&taskdata->td_parent->td_incomplete_child_tasks) > 0;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (taskdata->td_taskgroup && taskdata->is_taskgraph)
ret = ret || KMP_ATOMIC_LD_ACQ(&taskdata->td_taskgroup->count) > 0;
#endif
@@ -802,7 +802,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
kmp_info_t *thread = __kmp_threads[gtid];
kmp_task_team_t *task_team =
thread->th.th_task_team; // might be NULL for serial teams...
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
// to avoid seg fault when we need to access taskdata->td_flags after free when using vanilla taskloop
bool is_taskgraph;
#endif
@@ -815,7 +815,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
is_taskgraph = taskdata->is_taskgraph;
#endif
@@ -923,7 +923,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
if (completed) {
taskdata->td_flags.complete = 1; // mark the task as completed
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
taskdata->td_flags.onced = 1; // mark the task as ran once already
#endif
@@ -942,7 +942,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
#endif
KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks);
KMP_DEBUG_ASSERT(children >= 0);
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (taskdata->td_taskgroup && !taskdata->is_taskgraph)
#else
if (taskdata->td_taskgroup)
@@ -985,7 +985,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
// KMP_DEBUG_ASSERT( resumed_task->td_flags.executing == 0 );
resumed_task->td_flags.executing = 1; // resume previous task
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (is_taskgraph && __kmp_track_children_task(taskdata) &&
taskdata->td_taskgroup) {
// TDG: we only release taskgroup barrier here because
@@ -1113,7 +1113,7 @@ void __kmp_init_implicit_task(ident_t *loc_ref, kmp_info_t *this_thr,
task->td_flags.executing = 1;
task->td_flags.complete = 0;
task->td_flags.freed = 0;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
task->td_flags.onced = 0;
#endif
@@ -1159,7 +1159,7 @@ void __kmp_finish_implicit_task(kmp_info_t *thread) {
if (task->td_dephash) {
int children;
task->td_flags.complete = 1;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
task->td_flags.onced = 1;
#endif
children = KMP_ATOMIC_LD_ACQ(&task->td_incomplete_child_tasks);
@@ -1390,7 +1390,7 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
taskdata->td_flags.executing = 0;
taskdata->td_flags.complete = 0;
taskdata->td_flags.freed = 0;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
taskdata->td_flags.onced = 0;
taskdata->is_taskgraph = 0;
taskdata->tdg = nullptr;
@@ -1430,7 +1430,7 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
}
}
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status) &&
(task_entry != (kmp_routine_entry_t)__kmp_taskloop_task)) {
@@ -1807,7 +1807,7 @@ kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
bool serialize_immediate) {
kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (new_taskdata->is_taskgraph &&
__kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
kmp_tdg_info_t *tdg = new_taskdata->tdg;
@@ -2376,7 +2376,7 @@ the reduction either does not use omp_orig object, or the omp_orig is accessible
without help of the runtime library.
*/
void *__kmpc_task_reduction_init(int gtid, int num, void *data) {
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
kmp_tdg_info_t *this_tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
@@ -2403,7 +2403,7 @@ Note: this entry supposes the optional compiler-generated initializer routine
has two parameters, pointer to object to be initialized and pointer to omp_orig
*/
void *__kmpc_taskred_init(int gtid, int num, void *data) {
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
tdg->rec_taskred_data = __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
@@ -2457,7 +2457,7 @@ void *__kmpc_task_reduction_get_th_data(int gtid, void *tskgrp, void *data) {
kmp_int32 num;
kmp_int32 tid = thread->th.th_info.ds.ds_tid;
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if ((thread->th.th_current_task->is_taskgraph) &&
(!__kmp_tdg_is_recording(
__kmp_find_tdg(__kmp_curr_tdg_id)->tdg_status))) {
@@ -4235,7 +4235,7 @@ static void __kmp_first_top_half_finish_proxy(kmp_taskdata_t *taskdata) {
KMP_DEBUG_ASSERT(taskdata->td_flags.freed == 0);
taskdata->td_flags.complete = 1; // mark the task as completed
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
taskdata->td_flags.onced = 1;
#endif
@@ -4440,8 +4440,9 @@ void __kmp_fulfill_event(kmp_event_t *event) {
// indicating whether we need to update task->td_task_id
// returns: a pointer to the allocated kmp_task_t structure (task).
kmp_task_t *__kmp_task_dup_alloc(kmp_info_t *thread, kmp_task_t *task_src
-#if OMPX_TASKGRAPH
- , int taskloop_recur
+#if OMP_TASKGRAPH_EXPERIMENTAL
+ ,
+ int taskloop_recur
#endif
) {
kmp_task_t *task;
@@ -4471,7 +4472,7 @@ kmp_task_t *__kmp_task_dup_alloc(kmp_info_t *thread, kmp_task_t *task_src
task = KMP_TASKDATA_TO_TASK(taskdata);
// Initialize new task (only specific fields not affected by memcpy)
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
if (taskdata->is_taskgraph && !taskloop_recur &&
__kmp_tdg_is_recording(taskdata_src->tdg->tdg_status))
taskdata->td_tdg_task_id = KMP_ATOMIC_INC(&__kmp_tdg_task_id);
@@ -4704,7 +4705,7 @@ void __kmp_taskloop_linear(ident_t *loc, int gtid, kmp_task_t *task,
}
}
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
next_task = __kmp_task_dup_alloc(thread, task, /* taskloop_recur */ 0);
#else
next_task = __kmp_task_dup_alloc(thread, task); // allocate new task
@@ -4906,7 +4907,7 @@ void __kmp_taskloop_recur(ident_t *loc, int gtid, kmp_task_t *task,
lb1 = ub0 + st;
// create pattern task for 2nd half of the loop
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
next_task = __kmp_task_dup_alloc(thread, task,
/* taskloop_recur */ 1);
#else
@@ -4944,7 +4945,7 @@ void __kmp_taskloop_recur(ident_t *loc, int gtid, kmp_task_t *task,
p->codeptr_ra = codeptr_ra;
#endif
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_taskdata_t *new_task_data = KMP_TASK_TO_TASKDATA(new_task);
new_task_data->tdg = taskdata->tdg;
new_task_data->is_taskgraph = 0;
@@ -4989,7 +4990,7 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
__kmpc_taskgroup(loc, gtid);
}
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
KMP_ATOMIC_DEC(&__kmp_tdg_task_id);
#endif
// =========================================================================
@@ -5240,7 +5241,7 @@ bool __kmpc_omp_has_task_team(kmp_int32 gtid) {
return taskdata->td_task_team != NULL;
}
-#if OMPX_TASKGRAPH
+#if OMP_TASKGRAPH_EXPERIMENTAL
// __kmpc_taskgraph: record or replay taskgraph
// loc_ref: Location of TDG, not used yet
// gtid: Global Thread ID of the encountering thread
diff --git a/openmp/runtime/test/CMakeLists.txt b/openmp/runtime/test/CMakeLists.txt
index cbcd0c155f062..5ef3e27f08767 100644
--- a/openmp/runtime/test/CMakeLists.txt
+++ b/openmp/runtime/test/CMakeLists.txt
@@ -30,7 +30,7 @@ update_test_compiler_features()
pythonize_bool(LIBOMP_USE_HWLOC)
pythonize_bool(LIBOMP_OMPT_SUPPORT)
pythonize_bool(LIBOMP_OMPT_OPTIONAL)
-pythonize_bool(LIBOMP_OMPX_TASKGRAPH)
+pythonize_bool(LIBOMP_TASKGRAPH_EXPERIMENTAL)
pythonize_bool(LIBOMP_HAVE_LIBM)
pythonize_bool(LIBOMP_HAVE_LIBATOMIC)
pythonize_bool(OPENMP_TEST_COMPILER_HAS_OMIT_FRAME_POINTER_FLAGS)
diff --git a/openmp/runtime/test/lit.cfg b/openmp/runtime/test/lit.cfg
index 6d93789f2d44e..d5dfce01e2982 100644
--- a/openmp/runtime/test/lit.cfg
+++ b/openmp/runtime/test/lit.cfg
@@ -126,8 +126,8 @@ if config.has_ompt:
# for callback.h
config.test_flags += " -I " + config.test_source_root + "/ompt"
-if config.has_ompx_taskgraph:
- config.available_features.add("ompx_taskgraph")
+if config.has_omp_taskgraph_experimental:
+ config.available_features.add("omp_taskgraph_experimental")
if config.operating_system == 'AIX':
config.available_features.add("aix")
diff --git a/openmp/runtime/test/lit.site.cfg.in b/openmp/runtime/test/lit.site.cfg.in
index 7a51545b86fe1..03d709dbd9d4d 100644
--- a/openmp/runtime/test/lit.site.cfg.in
+++ b/openmp/runtime/test/lit.site.cfg.in
@@ -17,7 +17,7 @@ config.target_triple = "@LLVM_TARGET_TRIPLE@"
config.hwloc_library_dir = "@LIBOMP_HWLOC_LIBRARY_DIR@"
config.using_hwloc = @LIBOMP_USE_HWLOC@
config.has_ompt = @LIBOMP_OMPT_SUPPORT@ and @LIBOMP_OMPT_OPTIONAL@
-config.has_ompx_taskgraph = @LIBOMP_OMPX_TASKGRAPH@
+config.has_omp_taskgraph_experimental = @LIBOMP_TASKGRAPH_EXPERIMENTAL@
config.has_libm = @LIBOMP_HAVE_LIBM@
config.has_libatomic = @LIBOMP_HAVE_LIBATOMIC@
config.has_omit_frame_pointer_flag = @OPENMP_TEST_COMPILER_HAS_OMIT_FRAME_POINTER_FLAGS@
diff --git a/openmp/runtime/test/tasking/omp_record_replay.cpp b/openmp/runtime/test/tasking/omp_record_replay.cpp
index 69ad98003a0d6..4fea22e081da9 100644
--- a/openmp/runtime/test/tasking/omp_record_replay.cpp
+++ b/openmp/runtime/test/tasking/omp_record_replay.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: ompx_taskgraph
+// REQUIRES: omp_taskgraph_experimental
// RUN: %libomp-cxx-compile-and-run
#include <iostream>
#include <cassert>
diff --git a/openmp/runtime/test/tasking/omp_record_replay_deps.cpp b/openmp/runtime/test/tasking/omp_record_replay_deps.cpp
index 9b6b370b30efc..4c06ae3f7b273 100644
--- a/openmp/runtime/test/tasking/omp_record_replay_deps.cpp
+++ b/openmp/runtime/test/tasking/omp_record_replay_deps.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: ompx_taskgraph
+// REQUIRES: omp_taskgraph_experimental
// RUN: %libomp-cxx-compile-and-run
#include <iostream>
#include <cassert>
diff --git a/openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp b/openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
index 906fab335f510..6bcd3dee56030 100644
--- a/openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
+++ b/openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: ompx_taskgraph
+// REQUIRES: omp_taskgraph_experimental
// RUN: %libomp-cxx-compile-and-run
#include <omp.h>
#include <cassert>
diff --git a/openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp b/openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
index 03252843689c4..1864d5d89cc70 100644
--- a/openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
+++ b/openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: ompx_taskgraph
+// REQUIRES: omp_taskgraph_experimental
// RUN: %libomp-cxx-compile-and-run
#include <iostream>
#include <cassert>
diff --git a/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp b/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
index 2fe55f0815429..7f1f5ccd77d37 100644
--- a/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
+++ b/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: ompx_taskgraph
+// REQUIRES: omp_taskgraph_experimental
// RUN: %libomp-cxx-compile-and-run
#include <iostream>
#include <fstream>
diff --git a/openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp b/openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
index 3d88faeeb28ee..163a1b4192d85 100644
--- a/openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
+++ b/openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
@@ -1,4 +1,4 @@
-// REQUIRES: ompx_taskgraph
+// REQUIRES: omp_taskgraph_experimental
// RUN: %libomp-cxx-compile-and-run
#include <iostream>
#include <cassert>
>From 840011563efc23600423825ef2c89c8067016357 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Mon, 15 Sep 2025 05:25:55 -0500
Subject: [PATCH 04/28] [OpenMP] Taskgraph frontend support
This is a version of the 'ompx taskgraph' support posted in PR66919,
adapted to the official OpenMP 6.0 spelling of 'omp taskgraph', and with
the 'ompx' extension parts removed.
Co-authored-by: Adrian Munera <adrian.munera at bsc.es>
Co-authored-by: Jose M Monsalve Diaz <JoseM.MonsalveDiaz at amd.com>
---
clang/bindings/python/clang/cindex.py | 3 +
clang/include/clang-c/Index.h | 4 +
clang/include/clang/AST/RecursiveASTVisitor.h | 3 +
clang/include/clang/AST/StmtOpenMP.h | 49 ++++++++++++
clang/include/clang/Basic/StmtNodes.td | 1 +
clang/include/clang/Sema/SemaOpenMP.h | 4 +
.../include/clang/Serialization/ASTBitCodes.h | 1 +
clang/lib/AST/StmtOpenMP.cpp | 15 ++++
clang/lib/AST/StmtPrinter.cpp | 5 ++
clang/lib/AST/StmtProfile.cpp | 5 ++
clang/lib/Basic/OpenMPKinds.cpp | 3 +
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 74 +++++++++++++++++++
clang/lib/CodeGen/CGOpenMPRuntime.h | 8 ++
clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp | 2 +
clang/lib/CodeGen/CGStmt.cpp | 3 +
clang/lib/CodeGen/CGStmtOpenMP.cpp | 6 ++
clang/lib/CodeGen/CodeGenFunction.h | 1 +
clang/lib/Sema/SemaExceptionSpec.cpp | 1 +
clang/lib/Sema/SemaOpenMP.cpp | 31 ++++++++
clang/lib/Sema/TreeTransform.h | 11 +++
clang/lib/Serialization/ASTReaderStmt.cpp | 10 +++
clang/lib/Serialization/ASTWriterStmt.cpp | 6 ++
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp | 1 +
clang/tools/libclang/CIndex.cpp | 2 +
clang/tools/libclang/CXCursor.cpp | 3 +
.../include/llvm/Frontend/OpenMP/OMPKinds.def | 1 +
26 files changed, 253 insertions(+)
diff --git a/clang/bindings/python/clang/cindex.py b/clang/bindings/python/clang/cindex.py
index 1896a0a9c1c34..093bfc669b82f 100644
--- a/clang/bindings/python/clang/cindex.py
+++ b/clang/bindings/python/clang/cindex.py
@@ -1448,6 +1448,9 @@ def is_unexposed(self):
# OpenMP fuse directive.
OMP_FUSE_DIRECTIVE = 311
+ # OpenMP taskgraph directive.
+ OMP_TASKGRAPH_DIRECTIVE = 312
+
# OpenACC Compute Construct.
OPEN_ACC_COMPUTE_DIRECTIVE = 320
diff --git a/clang/include/clang-c/Index.h b/clang/include/clang-c/Index.h
index 203634c80d82a..31a43260edcec 100644
--- a/clang/include/clang-c/Index.h
+++ b/clang/include/clang-c/Index.h
@@ -2166,6 +2166,10 @@ enum CXCursorKind {
*/
CXCursor_OMPFuseDirective = 311,
+ /** OpenMP taskgraph directive.
+ */
+ CXCursor_OMPTaskgraphDirective = 312,
+
/** OpenACC Compute Construct.
*/
CXCursor_OpenACCComputeConstruct = 320,
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index ce6ad723191e0..612b35d615fc0 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -3255,6 +3255,9 @@ DEF_TRAVERSE_STMT(OMPBarrierDirective,
DEF_TRAVERSE_STMT(OMPTaskwaitDirective,
{ TRY_TO(TraverseOMPExecutableDirective(S)); })
+DEF_TRAVERSE_STMT(OMPTaskgraphDirective,
+ { TRY_TO(TraverseOMPExecutableDirective(S)); })
+
DEF_TRAVERSE_STMT(OMPTaskgroupDirective,
{ TRY_TO(TraverseOMPExecutableDirective(S)); })
diff --git a/clang/include/clang/AST/StmtOpenMP.h b/clang/include/clang/AST/StmtOpenMP.h
index bc6aeaa8d143c..be4d33c783800 100644
--- a/clang/include/clang/AST/StmtOpenMP.h
+++ b/clang/include/clang/AST/StmtOpenMP.h
@@ -2760,6 +2760,55 @@ class OMPTaskwaitDirective : public OMPExecutableDirective {
}
};
+/// This represents '#pragma omp taskgraph' directive.
+/// Available with OpenMP 6.0.
+///
+/// \code
+/// #pragma omp taskgraph
+/// \endcode
+///
+class OMPTaskgraphDirective final : public OMPExecutableDirective {
+ friend class ASTStmtReader;
+ friend class OMPExecutableDirective;
+ /// Build directive with the given start and end location.
+ ///
+ /// \param StartLoc Starting location of the directive kind.
+ /// \param EndLoc Ending location of the directive.
+ ///
+ OMPTaskgraphDirective(SourceLocation StartLoc, SourceLocation EndLoc)
+ : OMPExecutableDirective(OMPTaskgraphDirectiveClass,
+ llvm::omp::OMPD_taskgraph, StartLoc, EndLoc) {}
+
+ /// Build an empty directive.
+ ///
+ explicit OMPTaskgraphDirective()
+ : OMPExecutableDirective(OMPTaskgraphDirectiveClass,
+ llvm::omp::OMPD_taskgraph, SourceLocation(),
+ SourceLocation()) {}
+
+public:
+ /// Creates directive.
+ ///
+ /// \param C AST context.
+ /// \param StartLoc Starting location of the directive kind.
+ /// \param EndLoc Ending Location of the directive.
+ ///
+ static OMPTaskgraphDirective *
+ Create(const ASTContext &C, SourceLocation StartLoc, SourceLocation EndLoc,
+ ArrayRef<OMPClause *> Clauses, Stmt *AssociatedStmt);
+
+ /// Creates an empty directive.
+ ///
+ /// \param C AST context.
+ ///
+ static OMPTaskgraphDirective *CreateEmpty(const ASTContext &C,
+ unsigned NumClauses, EmptyShell);
+
+ static bool classof(const Stmt *T) {
+ return T->getStmtClass() == OMPTaskgraphDirectiveClass;
+ }
+};
+
/// This represents '#pragma omp taskgroup' directive.
///
/// \code
diff --git a/clang/include/clang/Basic/StmtNodes.td b/clang/include/clang/Basic/StmtNodes.td
index b196382025c95..19cb832782195 100644
--- a/clang/include/clang/Basic/StmtNodes.td
+++ b/clang/include/clang/Basic/StmtNodes.td
@@ -264,6 +264,7 @@ def OMPTaskDirective : StmtNode<OMPExecutableDirective>;
def OMPTaskyieldDirective : StmtNode<OMPExecutableDirective>;
def OMPBarrierDirective : StmtNode<OMPExecutableDirective>;
def OMPTaskwaitDirective : StmtNode<OMPExecutableDirective>;
+def OMPTaskgraphDirective : StmtNode<OMPExecutableDirective>;
def OMPTaskgroupDirective : StmtNode<OMPExecutableDirective>;
def OMPFlushDirective : StmtNode<OMPExecutableDirective>;
def OMPDepobjDirective : StmtNode<OMPExecutableDirective>;
diff --git a/clang/include/clang/Sema/SemaOpenMP.h b/clang/include/clang/Sema/SemaOpenMP.h
index 7853f29f98c25..cf31acb67863f 100644
--- a/clang/include/clang/Sema/SemaOpenMP.h
+++ b/clang/include/clang/Sema/SemaOpenMP.h
@@ -557,6 +557,10 @@ class SemaOpenMP : public SemaBase {
/// Called on well-formed '\#pragma omp barrier'.
StmtResult ActOnOpenMPBarrierDirective(SourceLocation StartLoc,
SourceLocation EndLoc);
+ /// Called on well-formed '\#pragma omp taskgraph'.
+ StmtResult ActOnOpenMPTaskgraphDirective(ArrayRef<OMPClause *> Clauses,
+ Stmt *AStmt, SourceLocation StartLoc,
+ SourceLocation EndLoc);
/// Called on well-formed '\#pragma omp taskwait'.
StmtResult ActOnOpenMPTaskwaitDirective(ArrayRef<OMPClause *> Clauses,
SourceLocation StartLoc,
diff --git a/clang/include/clang/Serialization/ASTBitCodes.h b/clang/include/clang/Serialization/ASTBitCodes.h
index 5db0b08f877ce..a40f9a6eba4fa 100644
--- a/clang/include/clang/Serialization/ASTBitCodes.h
+++ b/clang/include/clang/Serialization/ASTBitCodes.h
@@ -1981,6 +1981,7 @@ enum StmtCode {
STMT_OMP_ERROR_DIRECTIVE,
STMT_OMP_BARRIER_DIRECTIVE,
STMT_OMP_TASKWAIT_DIRECTIVE,
+ STMT_OMP_TASKGRAPH_DIRECTIVE,
STMT_OMP_FLUSH_DIRECTIVE,
STMT_OMP_DEPOBJ_DIRECTIVE,
STMT_OMP_SCAN_DIRECTIVE,
diff --git a/clang/lib/AST/StmtOpenMP.cpp b/clang/lib/AST/StmtOpenMP.cpp
index a5b0cd3786a28..41effd494524c 100644
--- a/clang/lib/AST/StmtOpenMP.cpp
+++ b/clang/lib/AST/StmtOpenMP.cpp
@@ -945,6 +945,21 @@ OMPTaskwaitDirective *OMPTaskwaitDirective::CreateEmpty(const ASTContext &C,
return createEmptyDirective<OMPTaskwaitDirective>(C, NumClauses);
}
+OMPTaskgraphDirective *OMPTaskgraphDirective::Create(
+ const ASTContext &C, SourceLocation StartLoc, SourceLocation EndLoc,
+ ArrayRef<OMPClause *> Clauses, Stmt *AssociatedStmt) {
+ auto *Dir = createDirective<OMPTaskgraphDirective>(
+ C, Clauses, AssociatedStmt, /*NumChildren=*/1, StartLoc, EndLoc);
+ return Dir;
+}
+
+OMPTaskgraphDirective *OMPTaskgraphDirective::CreateEmpty(const ASTContext &C,
+ unsigned NumClauses,
+ EmptyShell) {
+ return createEmptyDirective<OMPTaskgraphDirective>(
+ C, NumClauses, /*HasAssociatedStmt=*/true, /*NumChildren=*/1);
+}
+
OMPTaskgroupDirective *OMPTaskgroupDirective::Create(
const ASTContext &C, SourceLocation StartLoc, SourceLocation EndLoc,
ArrayRef<OMPClause *> Clauses, Stmt *AssociatedStmt, Expr *ReductionRef) {
diff --git a/clang/lib/AST/StmtPrinter.cpp b/clang/lib/AST/StmtPrinter.cpp
index 4d364fdcd5502..f82f83613dc4d 100644
--- a/clang/lib/AST/StmtPrinter.cpp
+++ b/clang/lib/AST/StmtPrinter.cpp
@@ -904,6 +904,11 @@ void StmtPrinter::VisitOMPAssumeDirective(OMPAssumeDirective *Node) {
PrintOMPExecutableDirective(Node);
}
+void StmtPrinter::VisitOMPTaskgraphDirective(OMPTaskgraphDirective *Node) {
+ Indent() << "#pragma omp taskgraph";
+ PrintOMPExecutableDirective(Node);
+}
+
void StmtPrinter::VisitOMPErrorDirective(OMPErrorDirective *Node) {
Indent() << "#pragma omp error";
PrintOMPExecutableDirective(Node);
diff --git a/clang/lib/AST/StmtProfile.cpp b/clang/lib/AST/StmtProfile.cpp
index dc7fd352a67b2..dff89c9085e6a 100644
--- a/clang/lib/AST/StmtProfile.cpp
+++ b/clang/lib/AST/StmtProfile.cpp
@@ -1138,9 +1138,14 @@ void StmtProfiler::VisitOMPAssumeDirective(const OMPAssumeDirective *S) {
VisitOMPExecutableDirective(S);
}
+void StmtProfiler::VisitOMPTaskgraphDirective(const OMPTaskgraphDirective *S) {
+ VisitOMPExecutableDirective(S);
+}
+
void StmtProfiler::VisitOMPErrorDirective(const OMPErrorDirective *S) {
VisitOMPExecutableDirective(S);
}
+
void StmtProfiler::VisitOMPTaskgroupDirective(const OMPTaskgroupDirective *S) {
VisitOMPExecutableDirective(S);
if (const Expr *E = S->getReductionRef())
diff --git a/clang/lib/Basic/OpenMPKinds.cpp b/clang/lib/Basic/OpenMPKinds.cpp
index 2c693b1958ee7..fd1ffa59abaac 100644
--- a/clang/lib/Basic/OpenMPKinds.cpp
+++ b/clang/lib/Basic/OpenMPKinds.cpp
@@ -942,6 +942,9 @@ void clang::getOpenMPCaptureRegions(
case OMPD_taskloop:
CaptureRegions.push_back(OMPD_taskloop);
break;
+ case OMPD_taskgraph:
+ CaptureRegions.push_back(OMPD_taskgraph);
+ break;
case OMPD_loop:
// TODO: 'loop' may require different capture regions depending on the
// bind clause or the parent directive when there is no bind clause.
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index ca1a1c0321fce..2316d80e511bc 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -60,6 +60,8 @@ class CGOpenMPRegionInfo : public CodeGenFunction::CGCapturedStmtInfo {
ParallelOutlinedRegion,
/// Region with outlined function for standalone 'task' directive.
TaskOutlinedRegion,
+ /// Region with outlined function for standalone 'taskgraph' directive.
+ TaskgraphOutlinedRegion,
/// Region for constructs that do not require function outlining,
/// like 'for', 'sections', 'atomic' etc. directives.
InlinedRegion,
@@ -234,6 +236,26 @@ class CGOpenMPTaskOutlinedRegionInfo final : public CGOpenMPRegionInfo {
const UntiedTaskActionTy &Action;
};
+/// API for captured statement code generation in OpenMP taskgraphs.
+class CGOpenMPTaskgraphRegionInfo final : public CGOpenMPRegionInfo {
+public:
+ CGOpenMPTaskgraphRegionInfo(const CapturedStmt &CS,
+ const RegionCodeGenTy &CodeGen)
+ : CGOpenMPRegionInfo(CS, TaskgraphOutlinedRegion, CodeGen,
+ llvm::omp::OMPD_taskgraph, false) {}
+
+ const VarDecl *getThreadIDVariable() const override { return 0; }
+
+ /// Get the name of the capture helper.
+ StringRef getHelperName() const override { return "taskgraph.omp_outlined."; }
+
+ static bool classof(const CGCapturedStmtInfo *Info) {
+ return CGOpenMPRegionInfo::classof(Info) &&
+ cast<CGOpenMPRegionInfo>(Info)->getRegionKind() ==
+ TaskgraphOutlinedRegion;
+ }
+};
+
/// API for inlined captured statement code generation in OpenMP
/// constructs.
class CGOpenMPInlinedRegionInfo : public CGOpenMPRegionInfo {
@@ -6100,6 +6122,48 @@ void CGOpenMPRuntime::emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
Region->emitUntiedSwitch(CGF);
}
+void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
+ SourceLocation Loc,
+ const OMPExecutableDirective &D) {
+ if (!CGF.HaveInsertPoint())
+ return;
+
+ // Building kmp_taskgraph_flags_t flags for kmpc_taskgraph. C.f., kmp.h
+ enum {
+ NowaitFlag = 0x1, // Not used yet.
+ ReRecordFlag = 0x2,
+ };
+
+ unsigned Flags = 0;
+
+ CodeGenFunction OutlinedCGF(CGM, /*suppressNewContext=*/true);
+
+ const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
+
+ auto BodyGen = [CS](CodeGenFunction &CGF, PrePostActionTy &) {
+ CGF.EmitStmt(CS->getCapturedStmt());
+ };
+
+ LValue CapStruct = CGF.InitCapturedStruct(*CS);
+ CGOpenMPTaskgraphRegionInfo TaskgraphRegion(*CS, BodyGen);
+ CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(OutlinedCGF,
+ &TaskgraphRegion);
+ llvm::Function *FnT = OutlinedCGF.GenerateCapturedStmtFunction(*CS);
+
+ std::array<llvm::Value *, 6> Args{
+ emitUpdateLocation(CGF, Loc),
+ getThreadID(CGF, Loc),
+ CGF.Builder.getInt32(Flags),
+ CGF.Builder.getInt32(D.getBeginLoc().getHashValue()),
+ CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(FnT, CGM.VoidPtrTy),
+ CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+ CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy)};
+
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph),
+ Args);
+}
+
void CGOpenMPRuntime::emitInlinedDirective(CodeGenFunction &CGF,
OpenMPDirectiveKind InnerKind,
const RegionCodeGenTy &CodeGen,
@@ -6536,6 +6600,7 @@ const Expr *CGOpenMPRuntime::getNumTeamsExprForTargetDirective(
case OMPD_taskyield:
case OMPD_barrier:
case OMPD_taskwait:
+ case OMPD_taskgraph:
case OMPD_taskgroup:
case OMPD_atomic:
case OMPD_flush:
@@ -10456,6 +10521,7 @@ getNestedDistributeDirective(ASTContext &Ctx, const OMPExecutableDirective &D) {
case OMPD_taskyield:
case OMPD_barrier:
case OMPD_taskwait:
+ case OMPD_taskgraph:
case OMPD_taskgroup:
case OMPD_atomic:
case OMPD_flush:
@@ -11191,6 +11257,7 @@ void CGOpenMPRuntime::scanForTargetRegionsFunctions(const Stmt *S,
case OMPD_taskyield:
case OMPD_barrier:
case OMPD_taskwait:
+ case OMPD_taskgraph:
case OMPD_taskgroup:
case OMPD_atomic:
case OMPD_flush:
@@ -11761,6 +11828,7 @@ void CGOpenMPRuntime::emitTargetDataStandAloneCall(
case OMPD_taskyield:
case OMPD_barrier:
case OMPD_taskwait:
+ case OMPD_taskgraph:
case OMPD_taskgroup:
case OMPD_atomic:
case OMPD_flush:
@@ -13285,6 +13353,12 @@ void CGOpenMPSIMDRuntime::emitTaskwaitCall(CodeGenFunction &CGF,
llvm_unreachable("Not supported in SIMD-only mode");
}
+void CGOpenMPSIMDRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
+ SourceLocation Loc,
+ const OMPExecutableDirective &D) {
+ llvm_unreachable("Not supported in SIMD-only mode");
+}
+
void CGOpenMPSIMDRuntime::emitCancellationPointCall(
CodeGenFunction &CGF, SourceLocation Loc,
OpenMPDirectiveKind CancelRegion) {
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index a81d3830a8035..2753f0e7f2dfc 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -1380,6 +1380,10 @@ class CGOpenMPRuntime {
virtual void emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
const OMPTaskDataTy &Data);
+ /// Emit code for 'taskgraph' directive.
+ virtual void emitTaskgraphCall(CodeGenFunction &CGF, SourceLocation Loc,
+ const OMPExecutableDirective &D);
+
/// Emit code for 'cancellation point' construct.
/// \param CancelRegion Region kind for which the cancellation point must be
/// emitted.
@@ -2208,6 +2212,10 @@ class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {
void emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
const OMPTaskDataTy &Data) override;
+ /// Emit code for 'taskgraph' directive.
+ void emitTaskgraphCall(CodeGenFunction &CGF, SourceLocation Loc,
+ const OMPExecutableDirective &D) override;
+
/// Emit code for 'cancellation point' construct.
/// \param CancelRegion Region kind for which the cancellation point must be
/// emitted.
diff --git a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
index b964ef38ddb69..ac9fdd2f7e051 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntimeGPU.cpp
@@ -572,6 +572,7 @@ static bool hasNestedSPMDDirective(ASTContext &Ctx,
case OMPD_taskyield:
case OMPD_barrier:
case OMPD_taskwait:
+ case OMPD_taskgraph:
case OMPD_taskgroup:
case OMPD_atomic:
case OMPD_flush:
@@ -660,6 +661,7 @@ static bool supportsSPMDExecutionMode(ASTContext &Ctx,
case OMPD_taskyield:
case OMPD_barrier:
case OMPD_taskwait:
+ case OMPD_taskgraph:
case OMPD_taskgroup:
case OMPD_atomic:
case OMPD_flush:
diff --git a/clang/lib/CodeGen/CGStmt.cpp b/clang/lib/CodeGen/CGStmt.cpp
index ad31ecc75b01e..6de9385f6625a 100644
--- a/clang/lib/CodeGen/CGStmt.cpp
+++ b/clang/lib/CodeGen/CGStmt.cpp
@@ -284,6 +284,9 @@ void CodeGenFunction::EmitStmt(const Stmt *S, ArrayRef<const Attr *> Attrs) {
case Stmt::OMPTaskwaitDirectiveClass:
EmitOMPTaskwaitDirective(cast<OMPTaskwaitDirective>(*S));
break;
+ case Stmt::OMPTaskgraphDirectiveClass:
+ EmitOMPTaskgraphDirective(cast<OMPTaskgraphDirective>(*S));
+ break;
case Stmt::OMPTaskgroupDirectiveClass:
EmitOMPTaskgroupDirective(cast<OMPTaskgroupDirective>(*S));
break;
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index cc85de9221eef..59e27c8a14e55 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -1444,6 +1444,7 @@ void CodeGenFunction::EmitOMPReductionClauseInit(
case OMPD_error:
case OMPD_barrier:
case OMPD_taskwait:
+ case OMPD_taskgraph:
case OMPD_taskgroup:
case OMPD_flush:
case OMPD_depobj:
@@ -5634,6 +5635,11 @@ void CodeGenFunction::EmitOMPTaskwaitDirective(const OMPTaskwaitDirective &S) {
CGM.getOpenMPRuntime().emitTaskwaitCall(*this, S.getBeginLoc(), Data);
}
+void CodeGenFunction::EmitOMPTaskgraphDirective(
+ const OMPTaskgraphDirective &S) {
+ CGM.getOpenMPRuntime().emitTaskgraphCall(*this, S.getBeginLoc(), S);
+}
+
static bool isSupportedByOpenMPIRBuilder(const OMPTaskgroupDirective &T) {
return T.clauses().empty();
}
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 9771b89b55aae..2b2d08570ee38 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -3939,6 +3939,7 @@ class CodeGenFunction : public CodeGenTypeCache {
void EmitOMPErrorDirective(const OMPErrorDirective &S);
void EmitOMPBarrierDirective(const OMPBarrierDirective &S);
void EmitOMPTaskwaitDirective(const OMPTaskwaitDirective &S);
+ void EmitOMPTaskgraphDirective(const OMPTaskgraphDirective &S);
void EmitOMPTaskgroupDirective(const OMPTaskgroupDirective &S);
void EmitOMPFlushDirective(const OMPFlushDirective &S);
void EmitOMPDepobjDirective(const OMPDepobjDirective &S);
diff --git a/clang/lib/Sema/SemaExceptionSpec.cpp b/clang/lib/Sema/SemaExceptionSpec.cpp
index 56079ea8e1bf8..aa3ac0ffc9948 100644
--- a/clang/lib/Sema/SemaExceptionSpec.cpp
+++ b/clang/lib/Sema/SemaExceptionSpec.cpp
@@ -1527,6 +1527,7 @@ CanThrowResult Sema::canThrow(const Stmt *S) {
case Stmt::OMPScopeDirectiveClass:
case Stmt::OMPTaskDirectiveClass:
case Stmt::OMPTaskgroupDirectiveClass:
+ case Stmt::OMPTaskgraphDirectiveClass:
case Stmt::OMPTaskLoopDirectiveClass:
case Stmt::OMPTaskLoopSimdDirectiveClass:
case Stmt::OMPTaskwaitDirectiveClass:
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index 34869e50b74ac..3f54fea2bf78b 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -4490,6 +4490,14 @@ getUnknownRegionParams(Sema &SemaRef) {
return Params;
}
+static SmallVector<SemaOpenMP::CapturedParamNameType>
+getTaskgraphRegionParams(Sema &SemaRef) {
+ SmallVector<SemaOpenMP::CapturedParamNameType> Params{
+ std::make_pair(StringRef(), QualType()) // __context with shared vars
+ };
+ return Params;
+}
+
static SmallVector<SemaOpenMP::CapturedParamNameType>
getTaskloopRegionParams(Sema &SemaRef) {
ASTContext &Context = SemaRef.getASTContext();
@@ -4563,6 +4571,10 @@ static void processCapturedRegions(Sema &SemaRef, OpenMPDirectiveKind DKind,
// function directly.
MarkAsInlined(SemaRef.getCurCapturedRegion());
break;
+ case OMPD_taskgraph:
+ SemaRef.ActOnCapturedRegionStart(
+ Loc, CurScope, CR_OpenMP, getTaskgraphRegionParams(SemaRef), Level);
+ break;
case OMPD_target:
SemaRef.ActOnCapturedRegionStart(Loc, CurScope, CR_OpenMP,
getTargetRegionParams(SemaRef), Level);
@@ -6524,6 +6536,12 @@ StmtResult SemaOpenMP::ActOnOpenMPExecutableDirective(
"No associated statement allowed for 'omp taskwait' directive");
Res = ActOnOpenMPTaskwaitDirective(ClausesWithImplicit, StartLoc, EndLoc);
break;
+ case OMPD_taskgraph:
+ assert(AStmt &&
+ "Associated statement required for 'omp taskgraph' directive");
+ Res = ActOnOpenMPTaskgraphDirective(ClausesWithImplicit, AStmt, StartLoc,
+ EndLoc);
+ break;
case OMPD_taskgroup:
Res = ActOnOpenMPTaskgroupDirective(ClausesWithImplicit, AStmt, StartLoc,
EndLoc);
@@ -11408,6 +11426,19 @@ SemaOpenMP::ActOnOpenMPTaskwaitDirective(ArrayRef<OMPClause *> Clauses,
Clauses);
}
+StmtResult
+SemaOpenMP::ActOnOpenMPTaskgraphDirective(ArrayRef<OMPClause *> Clauses,
+ Stmt *AStmt, SourceLocation StartLoc,
+ SourceLocation EndLoc) {
+ if (!AStmt)
+ return StmtError();
+
+ assert(isa<CapturedStmt>(AStmt) && "Captured statement expected");
+
+ return OMPTaskgraphDirective::Create(getASTContext(), StartLoc, EndLoc,
+ Clauses, AStmt);
+}
+
StmtResult
SemaOpenMP::ActOnOpenMPTaskgroupDirective(ArrayRef<OMPClause *> Clauses,
Stmt *AStmt, SourceLocation StartLoc,
diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index 8ae5df367e0dd..fe6b9a4755e04 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -9967,6 +9967,17 @@ TreeTransform<Derived>::TransformOMPAssumeDirective(OMPAssumeDirective *D) {
return Res;
}
+template <typename Derived>
+StmtResult TreeTransform<Derived>::TransformOMPTaskgraphDirective(
+ OMPTaskgraphDirective *D) {
+ DeclarationNameInfo DirName;
+ getDerived().getSema().OpenMP().StartOpenMPDSABlock(
+ OMPD_taskgraph, DirName, nullptr, D->getBeginLoc());
+ StmtResult Res = getDerived().TransformOMPExecutableDirective(D);
+ getDerived().getSema().OpenMP().EndOpenMPDSABlock(Res.get());
+ return Res;
+}
+
template <typename Derived>
StmtResult
TreeTransform<Derived>::TransformOMPErrorDirective(OMPErrorDirective *D) {
diff --git a/clang/lib/Serialization/ASTReaderStmt.cpp b/clang/lib/Serialization/ASTReaderStmt.cpp
index f351e185e5b58..41b3124d2020a 100644
--- a/clang/lib/Serialization/ASTReaderStmt.cpp
+++ b/clang/lib/Serialization/ASTReaderStmt.cpp
@@ -2622,6 +2622,11 @@ void ASTStmtReader::VisitOMPAssumeDirective(OMPAssumeDirective *D) {
VisitOMPExecutableDirective(D);
}
+void ASTStmtReader::VisitOMPTaskgraphDirective(OMPTaskgraphDirective *D) {
+ VisitStmt(D);
+ VisitOMPExecutableDirective(D);
+}
+
void ASTStmtReader::VisitOMPErrorDirective(OMPErrorDirective *D) {
VisitStmt(D);
// The NumClauses field was read in ReadStmtFromStream.
@@ -3783,6 +3788,11 @@ Stmt *ASTReader::ReadStmtFromStream(ModuleFile &F) {
Context, Record[ASTStmtReader::NumStmtFields], Empty);
break;
+ case STMT_OMP_TASKGRAPH_DIRECTIVE:
+ S = OMPTaskgraphDirective::CreateEmpty(
+ Context, Record[ASTStmtReader::NumStmtFields], Empty);
+ break;
+
case STMT_OMP_ERROR_DIRECTIVE:
S = OMPErrorDirective::CreateEmpty(
Context, Record[ASTStmtReader::NumStmtFields], Empty);
diff --git a/clang/lib/Serialization/ASTWriterStmt.cpp b/clang/lib/Serialization/ASTWriterStmt.cpp
index d9b95e53f2da0..ac2449aaf4072 100644
--- a/clang/lib/Serialization/ASTWriterStmt.cpp
+++ b/clang/lib/Serialization/ASTWriterStmt.cpp
@@ -2711,6 +2711,12 @@ void ASTStmtWriter::VisitOMPAssumeDirective(OMPAssumeDirective *D) {
Code = serialization::STMT_OMP_ASSUME_DIRECTIVE;
}
+void ASTStmtWriter::VisitOMPTaskgraphDirective(OMPTaskgraphDirective *D) {
+ VisitStmt(D);
+ VisitOMPExecutableDirective(D);
+ Code = serialization::STMT_OMP_TASKGRAPH_DIRECTIVE;
+}
+
void ASTStmtWriter::VisitOMPErrorDirective(OMPErrorDirective *D) {
VisitStmt(D);
Record.push_back(D->getNumClauses());
diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
index 30aee25d35dea..d1f2420a9219a 100644
--- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
+++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
@@ -1765,6 +1765,7 @@ void ExprEngine::Visit(const Stmt *S, ExplodedNode *Pred,
case Stmt::OMPTaskyieldDirectiveClass:
case Stmt::OMPBarrierDirectiveClass:
case Stmt::OMPTaskwaitDirectiveClass:
+ case Stmt::OMPTaskgraphDirectiveClass:
case Stmt::OMPErrorDirectiveClass:
case Stmt::OMPTaskgroupDirectiveClass:
case Stmt::OMPFlushDirectiveClass:
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index 31b6a3222d916..a2f166d8ff78d 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -6362,6 +6362,8 @@ CXString clang_getCursorKindSpelling(enum CXCursorKind Kind) {
return cxstring::createRef("OMPTaskwaitDirective");
case CXCursor_OMPAssumeDirective:
return cxstring::createRef("OMPAssumeDirective");
+ case CXCursor_OMPTaskgraphDirective:
+ return cxstring::createRef("OMPTaskgraphDirective");
case CXCursor_OMPErrorDirective:
return cxstring::createRef("OMPErrorDirective");
case CXCursor_OMPTaskgroupDirective:
diff --git a/clang/tools/libclang/CXCursor.cpp b/clang/tools/libclang/CXCursor.cpp
index d31d2c0c9bb67..97f57a8796f8a 100644
--- a/clang/tools/libclang/CXCursor.cpp
+++ b/clang/tools/libclang/CXCursor.cpp
@@ -754,6 +754,9 @@ CXCursor cxcursor::MakeCXCursor(const Stmt *S, const Decl *Parent,
case Stmt::OMPTaskwaitDirectiveClass:
K = CXCursor_OMPTaskwaitDirective;
break;
+ case Stmt::OMPTaskgraphDirectiveClass:
+ K = CXCursor_OMPTaskgraphDirective;
+ break;
case Stmt::OMPErrorDirectiveClass:
K = CXCursor_OMPErrorDirective;
break;
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 5fe7ee8997243..430a2b147e2e5 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -355,6 +355,7 @@ __OMP_RTL(__kmpc_omp_task_alloc, false, /* kmp_task_t */ VoidPtr, IdentPtr,
Int32, Int32, SizeTy, SizeTy, TaskRoutineEntryPtr)
__OMP_RTL(__kmpc_omp_task, false, Int32, IdentPtr, Int32,
/* kmp_task_t */ VoidPtr)
+__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, Int32, Int32, VoidPtr, VoidPtr)
__OMP_RTL(__kmpc_end_taskgroup, false, Void, IdentPtr, Int32)
__OMP_RTL(__kmpc_taskgroup, false, Void, IdentPtr, Int32)
__OMP_RTL(__kmpc_omp_task_begin_if0, false, Void, IdentPtr, Int32,
>From 96aa765216144677cf5ad569076228493433809f Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Fri, 26 Sep 2025 07:09:42 -0500
Subject: [PATCH 05/28] [OpenMP] New Clang tests for 'taskgraph' directive
This patch adds two simple tests for parsing/serialization/deserialization
and codegen for the OpenMP 'taskgraph' directive (for the current
record-and-replay based implementaion).
---
clang/test/OpenMP/taskgraph_ast_print.cpp | 31 ++++++++++++++
clang/test/OpenMP/taskgraph_codegen.cpp | 52 +++++++++++++++++++++++
2 files changed, 83 insertions(+)
create mode 100644 clang/test/OpenMP/taskgraph_ast_print.cpp
create mode 100644 clang/test/OpenMP/taskgraph_codegen.cpp
diff --git a/clang/test/OpenMP/taskgraph_ast_print.cpp b/clang/test/OpenMP/taskgraph_ast_print.cpp
new file mode 100644
index 0000000000000..063f734558345
--- /dev/null
+++ b/clang/test/OpenMP/taskgraph_ast_print.cpp
@@ -0,0 +1,31 @@
+// RUN: %clang_cc1 -verify -fopenmp -ast-print %s | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -std=c++11 -include-pch %t -verify %s -ast-print | FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+int main() {
+ int x = 0, y = 0;
+
+#pragma omp taskgraph
+// CHECK: #pragma omp taskgraph
+ {
+#pragma omp task depend(in: x) depend(out: y)
+// CHECK: #pragma omp task depend(in : x) depend(out : y)
+ {
+ y = x;
+ }
+#pragma omp task depend(inout: x, y)
+// CHECK: #pragma omp task depend(inout : x,y)
+ {
+ x++;
+ y++;
+ }
+ }
+
+ return 0;
+}
+
+#endif
diff --git a/clang/test/OpenMP/taskgraph_codegen.cpp b/clang/test/OpenMP/taskgraph_codegen.cpp
new file mode 100644
index 0000000000000..1c5d6c73d8890
--- /dev/null
+++ b/clang/test/OpenMP/taskgraph_codegen.cpp
@@ -0,0 +1,52 @@
+// NOTE: Assertions have been autogenerated by utils/update_cc_test_checks.py UTC_ARGS: --functions "main" --replace-value-regex "[0-9][0-9][0-9]+" --prefix-filecheck-ir-name _
+
+// FIXME: The hash used to identify taskgraph regions (the fourth argument of
+// __kmpc_taskgraph) is unstable between the two compiler invocations below,
+// and furthermore is a little hard to identify with update_cc_test_checks.py.
+// The above works for now, but it's not ideal.
+
+// RUN: %clang_cc1 -verify -fopenmp -x c++ -triple x86_64-unknown-unknown -emit-llvm %s -fexceptions -fcxx-exceptions -o - | FileCheck %s
+// RUN: %clang_cc1 -fopenmp -x c++ -std=c++11 -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -emit-pch -o %t %s
+// RUN: %clang_cc1 -fopenmp -x c++ -triple x86_64-unknown-unknown -fexceptions -fcxx-exceptions -std=c++11 -include-pch %t -verify %s -emit-llvm -o - | FileCheck %s
+// expected-no-diagnostics
+
+#ifndef HEADER
+#define HEADER
+
+// CHECK-LABEL: @main(
+// CHECK-NEXT: entry:
+// CHECK-NEXT: [[RETVAL:%.*]] = alloca i32, align 4
+// CHECK-NEXT: [[X:%.*]] = alloca i32, align 4
+// CHECK-NEXT: [[Y:%.*]] = alloca i32, align 4
+// CHECK-NEXT: [[AGG_CAPTURED:%.*]] = alloca [[STRUCT_ANON:%.*]], align 8
+// CHECK-NEXT: [[TMP0:%.*]] = call i32 @__kmpc_global_thread_num(ptr @[[GLOB1:[0-9]+]])
+// CHECK-NEXT: store i32 0, ptr [[RETVAL]], align 4
+// CHECK-NEXT: store i32 0, ptr [[X]], align 4
+// CHECK-NEXT: store i32 0, ptr [[Y]], align 4
+// CHECK-NEXT: [[TMP1:%.*]] = getelementptr inbounds nuw [[STRUCT_ANON]], ptr [[AGG_CAPTURED]], i32 0, i32 0
+// CHECK-NEXT: store ptr [[X]], ptr [[TMP1]], align 8
+// CHECK-NEXT: [[TMP2:%.*]] = getelementptr inbounds nuw [[STRUCT_ANON]], ptr [[AGG_CAPTURED]], i32 0, i32 1
+// CHECK-NEXT: store ptr [[Y]], ptr [[TMP2]], align 8
+// CHECK-NEXT: call void @__kmpc_taskgraph(ptr @[[GLOB1]], i32 [[TMP0]], i32 0, i32 {{[0-9][0-9][0-9]+}}, ptr @taskgraph.omp_outlined., ptr [[AGG_CAPTURED]])
+// CHECK-NEXT: ret i32 0
+//
+int main() {
+ int x = 0, y = 0;
+
+#pragma omp taskgraph
+ {
+#pragma omp task depend(in: x) depend(out: y)
+ {
+ y = x;
+ }
+#pragma omp task depend(inout: x, y)
+ {
+ x++;
+ y++;
+ }
+ }
+
+ return 0;
+}
+
+#endif
>From e6349f3fa624c32e4cd53b13e59936577cd01124 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Mon, 15 Sep 2025 05:54:11 -0500
Subject: [PATCH 06/28] [OpenMP] New/derived taskgraph tests
This patch adds new tests for 'omp taskgraph' functionality, but (unlike
the patches posted in PR66919) leave the existing tests using the internal
runtime API for record and replay as-is.
I have changed the 'print_dot' tests to use FileCheck instead of their
own internal checking, though.
Co-authored-by: Adrian Munera <adrian.munera at bsc.es>
---
.../tasking/omp_record_replay_print_dot.cpp | 53 ++++++---------
openmp/runtime/test/tasking/omp_taskgraph.cpp | 35 ++++++++++
.../test/tasking/omp_taskgraph_deps.cpp | 52 +++++++++++++++
.../test/tasking/omp_taskgraph_multiTDGs.cpp | 66 +++++++++++++++++++
.../test/tasking/omp_taskgraph_print_dot.cpp | 58 ++++++++++++++++
.../test/tasking/omp_taskgraph_taskloop.cpp | 39 +++++++++++
6 files changed, 271 insertions(+), 32 deletions(-)
create mode 100644 openmp/runtime/test/tasking/omp_taskgraph.cpp
create mode 100644 openmp/runtime/test/tasking/omp_taskgraph_deps.cpp
create mode 100644 openmp/runtime/test/tasking/omp_taskgraph_multiTDGs.cpp
create mode 100644 openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
create mode 100644 openmp/runtime/test/tasking/omp_taskgraph_taskloop.cpp
diff --git a/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp b/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
index 7f1f5ccd77d37..e3d2c017c21c7 100644
--- a/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
+++ b/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
@@ -1,8 +1,9 @@
// REQUIRES: omp_taskgraph_experimental
// RUN: %libomp-cxx-compile-and-run
-#include <iostream>
-#include <fstream>
-#include <sstream>
+// RUN: cat tdg_0.dot | FileCheck %s
+// RUN: rm -f tdg_0.dot
+
+#include <cstdlib>
#include <cassert>
// Compiler-generated code (emulation)
@@ -23,29 +24,14 @@ void func(int *num_exec) {
(*num_exec)++;
}
-std::string tdg_string= "digraph TDG {\n"
-" compound=true\n"
-" subgraph cluster {\n"
-" label=TDG_0\n"
-" 0[style=bold]\n"
-" 1[style=bold]\n"
-" 2[style=bold]\n"
-" 3[style=bold]\n"
-" }\n"
-" 0 -> 1 \n"
-" 1 -> 2 \n"
-" 1 -> 3 \n"
-"}";
-
int main() {
int num_exec = 0;
int x, y;
- setenv("KMP_TDG_DOT","TRUE",1);
- remove("tdg_0.dot");
+ setenv("KMP_TDG_DOT", "TRUE", 1);
- #pragma omp parallel
- #pragma omp single
+#pragma omp parallel
+#pragma omp single
{
int gtid = __kmpc_global_thread_num(nullptr);
int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0, /* tdg_id */ 0);
@@ -65,16 +51,19 @@ int main() {
assert(num_exec == 4);
- std::ifstream tdg_file("tdg_0.dot");
- assert(tdg_file.is_open());
-
- std::stringstream tdg_file_stream;
- tdg_file_stream << tdg_file.rdbuf();
- int equal = tdg_string.compare(tdg_file_stream.str());
-
- assert(equal == 0);
-
- std::cout << "Passed" << std::endl;
return 0;
}
-// CHECK: Passed
+
+// CHECK: digraph TDG {
+// CHECK-NEXT: compound=true
+// CHECK-NEXT: subgraph cluster {
+// CHECK-NEXT: label=TDG_0
+// CHECK-NEXT: 0[style=bold]
+// CHECK-NEXT: 1[style=bold]
+// CHECK-NEXT: 2[style=bold]
+// CHECK-NEXT: 3[style=bold]
+// CHECK-NEXT: }
+// CHECK-NEXT: 0 -> 1
+// CHECK-NEXT: 1 -> 2
+// CHECK-NEXT: 1 -> 3
+// CHECK-NEXT: }
diff --git a/openmp/runtime/test/tasking/omp_taskgraph.cpp b/openmp/runtime/test/tasking/omp_taskgraph.cpp
new file mode 100644
index 0000000000000..363a7da8c145a
--- /dev/null
+++ b/openmp/runtime/test/tasking/omp_taskgraph.cpp
@@ -0,0 +1,35 @@
+// REQUIRES: omp_taskgraph_experimental
+// RUN: %libomp-cxx-compile-and-run
+#include <iostream>
+#include <cassert>
+#define NT 100
+
+// Compiler-generated code (emulation)
+typedef struct ident {
+ void *dummy;
+} ident_t;
+
+void func(int *num_exec) { (*num_exec)++; }
+
+int main() {
+ int num_exec = 0;
+ int num_tasks = 0;
+ int x = 0;
+#pragma omp parallel
+#pragma omp single
+ for (int iter = 0; iter < NT; ++iter) {
+#pragma omp taskgraph
+ {
+ num_tasks++;
+#pragma omp task
+ func(&num_exec);
+ }
+ }
+
+ assert(num_tasks == 1);
+ assert(num_exec == NT);
+
+ std::cout << "Passed" << std::endl;
+ return 0;
+}
+// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_taskgraph_deps.cpp b/openmp/runtime/test/tasking/omp_taskgraph_deps.cpp
new file mode 100644
index 0000000000000..3341b019a5095
--- /dev/null
+++ b/openmp/runtime/test/tasking/omp_taskgraph_deps.cpp
@@ -0,0 +1,52 @@
+// REQUIRES: omp_taskgraph_experimental
+// RUN: %libomp-cxx-compile-and-run
+#include <iostream>
+#include <cassert>
+#define NT 100
+#define MULTIPLIER 100
+#define DECREMENT 5
+
+int val;
+// Compiler-generated code (emulation)
+typedef struct ident {
+ void *dummy;
+} ident_t;
+
+void sub() {
+#pragma omp atomic
+ val -= DECREMENT;
+}
+
+void add() {
+#pragma omp atomic
+ val += DECREMENT;
+}
+
+void mult() {
+ // no atomicity needed, can only be executed by 1 thread
+ // and no concurrency with other tasks possible
+ val *= MULTIPLIER;
+}
+
+int main() {
+ val = 0;
+ int *x, *y;
+#pragma omp parallel
+#pragma omp single
+ for (int iter = 0; iter < NT; ++iter) {
+#pragma omp taskgraph
+ {
+#pragma omp task depend(out : y)
+ add();
+#pragma omp task depend(out : x)
+ sub();
+#pragma omp task depend(in : x, y)
+ mult();
+ }
+ }
+ assert(val == 0);
+
+ std::cout << "Passed" << std::endl;
+ return 0;
+}
+// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_taskgraph_multiTDGs.cpp b/openmp/runtime/test/tasking/omp_taskgraph_multiTDGs.cpp
new file mode 100644
index 0000000000000..98a4ee27d0d5b
--- /dev/null
+++ b/openmp/runtime/test/tasking/omp_taskgraph_multiTDGs.cpp
@@ -0,0 +1,66 @@
+// REQUIRES: omp_taskgraph_experimental
+// RUN: %libomp-cxx-compile-and-run
+#include <iostream>
+#include <cassert>
+#define NT 20
+#define MULTIPLIER 100
+#define DECREMENT 5
+
+// Compiler-generated code (emulation)
+typedef struct ident {
+ void *dummy;
+} ident_t;
+
+int val;
+
+void sub() {
+#pragma omp atomic
+ val -= DECREMENT;
+}
+
+void add() {
+#pragma omp atomic
+ val += DECREMENT;
+}
+
+void mult() {
+ // no atomicity needed, can only be executed by 1 thread
+ // and no concurrency with other tasks possible
+ val *= MULTIPLIER;
+}
+
+int main() {
+ int num_tasks = 0;
+ int *x, *y;
+#pragma omp parallel
+#pragma omp single
+ for (int iter = 0; iter < NT; ++iter) {
+#pragma omp taskgraph
+ {
+ num_tasks++;
+#pragma omp task depend(out : y)
+ add();
+#pragma omp task depend(out : x)
+ sub();
+#pragma omp task depend(in : x, y)
+ mult();
+ }
+#pragma omp taskgraph
+ {
+ num_tasks++;
+#pragma omp task depend(out : y)
+ add();
+#pragma omp task depend(out : x)
+ sub();
+#pragma omp task depend(in : x, y)
+ mult();
+ }
+ }
+
+ assert(num_tasks == 2);
+ assert(val == 0);
+
+ std::cout << "Passed" << std::endl;
+ return 0;
+}
+// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp b/openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
new file mode 100644
index 0000000000000..0dc81df32d93a
--- /dev/null
+++ b/openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
@@ -0,0 +1,58 @@
+// REQUIRES: omp_taskgraph_experimental
+// RUN: %libomp-cxx-compile-and-run
+// RUN: cat tdg_17353.dot | FileCheck %s
+// RUN: rm -f tdg_17353.dot
+
+#include <cstdlib>
+#include <cassert>
+
+// Compiler-generated code (emulation)
+typedef struct ident {
+ void *dummy;
+} ident_t;
+
+void func(int *num_exec) {
+#pragma omp atomic
+ (*num_exec)++;
+}
+
+int main() {
+ int num_exec = 0;
+ int x, y;
+
+ setenv("KMP_TDG_DOT", "TRUE", 1);
+
+#pragma omp parallel
+#pragma omp single
+ {
+#pragma omp taskgraph
+ {
+#pragma omp task depend(out : x)
+ func(&num_exec);
+#pragma omp task depend(in : x) depend(out : y)
+ func(&num_exec);
+#pragma omp task depend(in : y)
+ func(&num_exec);
+#pragma omp task depend(in : y)
+ func(&num_exec);
+ }
+ }
+
+ assert(num_exec == 4);
+
+ return 0;
+}
+
+// CHECK: digraph TDG {
+// CHECK-NEXT: compound=true
+// CHECK-NEXT: subgraph cluster {
+// CHECK-NEXT: label=TDG_17353
+// CHECK-NEXT: 0[style=bold]
+// CHECK-NEXT: 1[style=bold]
+// CHECK-NEXT: 2[style=bold]
+// CHECK-NEXT: 3[style=bold]
+// CHECK-NEXT: }
+// CHECK-NEXT: 0 -> 1
+// CHECK-NEXT: 1 -> 2
+// CHECK-NEXT: 1 -> 3
+// CHECK-NEXT: }
diff --git a/openmp/runtime/test/tasking/omp_taskgraph_taskloop.cpp b/openmp/runtime/test/tasking/omp_taskgraph_taskloop.cpp
new file mode 100644
index 0000000000000..bbea64a2e92af
--- /dev/null
+++ b/openmp/runtime/test/tasking/omp_taskgraph_taskloop.cpp
@@ -0,0 +1,39 @@
+// REQUIRES: omp_taskgraph_experimental
+// RUN: %libomp-cxx-compile-and-run
+#include <iostream>
+#include <cassert>
+
+#define NT 20
+#define N 128 * 128
+
+typedef struct ident {
+ void *dummy;
+} ident_t;
+
+int main() {
+ int num_tasks = 0;
+
+ int array[N];
+ for (int i = 0; i < N; ++i)
+ array[i] = 1;
+
+ long sum = 0;
+#pragma omp parallel
+#pragma omp single
+ for (int iter = 0; iter < NT; ++iter) {
+#pragma omp taskgraph
+ {
+ num_tasks++;
+#pragma omp taskloop reduction(+ : sum) num_tasks(4096)
+ for (int i = 0; i < N; ++i) {
+ sum += array[i];
+ }
+ }
+ }
+ assert(sum == N * NT);
+ assert(num_tasks == 1);
+
+ std::cout << "Passed" << std::endl;
+ return 0;
+}
+// CHECK: Passed
>From e3286dc048719a3e0986024d435b67650165a6d1 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Thu, 17 Jul 2025 12:22:01 +0200
Subject: [PATCH 07/28] [OpenMP] Fix td_tdg_task_id underflow with taskloop and
taskgraph
This patch addresses an issue where the td_tdg_task_id could underflow,
leading to a negative task ID, when a taskloop region was encountered
before a taskgraph clause.
---
openmp/runtime/src/kmp_tasking.cpp | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index 660b5f9e3373f..5c85d80ce559b 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -4991,7 +4991,8 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
}
#if OMP_TASKGRAPH_EXPERIMENTAL
- KMP_ATOMIC_DEC(&__kmp_tdg_task_id);
+ if (taskdata->is_taskgraph)
+ KMP_ATOMIC_DEC(&__kmp_tdg_task_id);
#endif
// =========================================================================
// calculate loop parameters
>From 34776bf21a27265f6876b4b8859dc3c14aef03ce Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Wed, 30 Jul 2025 09:57:56 +0200
Subject: [PATCH 08/28] [wip][openmp] Move _kmp_tdg_task_id inside kmp_tdg_info
---
openmp/runtime/src/kmp.h | 1 +
openmp/runtime/src/kmp_tasking.cpp | 11 ++++++-----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index b17dbeadc7bdd..3a2ab8c94d476 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2671,6 +2671,7 @@ typedef struct kmp_tdg_info {
kmp_tdg_status_t tdg_status =
KMP_TDG_NONE; // Status of the TDG (recording, ready...)
std::atomic<kmp_int32> num_tasks; // Number of TDG nodes
+ std::atomic<kmp_int32> tdg_task_id_next; // Task id of next node
kmp_bootstrap_lock_t
graph_lock; // Protect graph attributes when updated via taskloop_recur
// Taskloop reduction related
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index 5c85d80ce559b..6009de1a1bb8c 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1437,7 +1437,7 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
taskdata->is_taskgraph = 1;
taskdata->tdg = tdg;
taskdata->td_task_id = KMP_GEN_TASK_ID();
- taskdata->td_tdg_task_id = KMP_ATOMIC_INC(&__kmp_tdg_task_id);
+ taskdata->td_tdg_task_id = KMP_ATOMIC_INC(&tdg->tdg_task_id_next);
}
#endif
KA_TRACE(20, ("__kmp_task_alloc(exit): T#%d created task %p parent=%p\n",
@@ -4475,7 +4475,8 @@ kmp_task_t *__kmp_task_dup_alloc(kmp_info_t *thread, kmp_task_t *task_src
#if OMP_TASKGRAPH_EXPERIMENTAL
if (taskdata->is_taskgraph && !taskloop_recur &&
__kmp_tdg_is_recording(taskdata_src->tdg->tdg_status))
- taskdata->td_tdg_task_id = KMP_ATOMIC_INC(&__kmp_tdg_task_id);
+ taskdata->td_tdg_task_id =
+ KMP_ATOMIC_INC(&taskdata_src->tdg->tdg_task_id_next);
#endif
taskdata->td_task_id = KMP_GEN_TASK_ID();
if (task->shareds != NULL) { // need setup shareds pointer
@@ -4991,8 +4992,9 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
}
#if OMP_TASKGRAPH_EXPERIMENTAL
- if (taskdata->is_taskgraph)
- KMP_ATOMIC_DEC(&__kmp_tdg_task_id);
+ if (taskdata->is_taskgraph && taskdata->tdg)
+ KMP_ATOMIC_DEC(&taskdata->tdg->tdg_task_id_next);
+ /* KMP_ATOMIC_DEC(&__kmp_tdg_task_id); */
#endif
// =========================================================================
// calculate loop parameters
@@ -5493,7 +5495,6 @@ void __kmp_end_record(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
KMP_ATOMIC_ST_RLX(&this_record_map[i].npredecessors_counter,
this_record_map[i].npredecessors);
}
- KMP_ATOMIC_ST_RLX(&__kmp_tdg_task_id, 0);
if (__kmp_tdg_dot)
__kmp_print_tdg_dot(tdg, gtid);
>From d4fca551d78136392403ed90de1efba45c397cc3 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Wed, 3 Sep 2025 09:44:59 +0200
Subject: [PATCH 09/28] [openmp] Update tdg record_map to allow holes
---
openmp/runtime/src/kmp_tasking.cpp | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index 6009de1a1bb8c..7f138b267a4b2 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -4990,12 +4990,6 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
#endif
__kmpc_taskgroup(loc, gtid);
}
-
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (taskdata->is_taskgraph && taskdata->tdg)
- KMP_ATOMIC_DEC(&taskdata->tdg->tdg_task_id_next);
- /* KMP_ATOMIC_DEC(&__kmp_tdg_task_id); */
-#endif
// =========================================================================
// calculate loop parameters
kmp_taskloop_bounds_t task_bounds(task, lb, ub);
@@ -5300,6 +5294,7 @@ void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg, kmp_int32 gtid) {
kmp_safe_raii_file_t tdg_file(file_name, "w");
kmp_int32 num_tasks = KMP_ATOMIC_LD_RLX(&tdg->num_tasks);
+ kmp_int32 map_size = tdg->map_size;
fprintf(tdg_file,
"digraph TDG {\n"
" compound=true\n"
@@ -5310,7 +5305,11 @@ void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg, kmp_int32 gtid) {
fprintf(tdg_file, " %d[style=bold]\n", i);
}
fprintf(tdg_file, " }\n");
- for (kmp_int32 i = 0; i < num_tasks; i++) {
+ kmp_int32 tasks = 0;
+ for (kmp_int32 i = 0; tasks < num_tasks && i < map_size; i++) {
+ if (tdg->record_map[i].task == nullptr)
+ continue;
+ tasks++;
kmp_int32 nsuccessors = tdg->record_map[i].nsuccessors;
kmp_int32 *successors = tdg->record_map[i].successors;
if (nsuccessors > 0) {
@@ -5335,6 +5334,7 @@ void __kmp_exec_tdg(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
kmp_int32 *this_root_tasks = tdg->root_tasks;
kmp_int32 this_num_roots = tdg->num_roots;
kmp_int32 this_num_tasks = KMP_ATOMIC_LD_RLX(&tdg->num_tasks);
+ kmp_int32 tasks = 0;
kmp_info_t *thread = __kmp_threads[gtid];
kmp_taskdata_t *parent_task = thread->th.th_current_task;
@@ -5343,7 +5343,10 @@ void __kmp_exec_tdg(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
__kmpc_taskred_init(gtid, tdg->rec_num_taskred, tdg->rec_taskred_data);
}
- for (kmp_int32 j = 0; j < this_num_tasks; j++) {
+ for (kmp_int32 j = 0; j < tdg->map_size && tasks < this_num_tasks; j++) {
+ if (this_record_map[j].task == nullptr)
+ continue;
+ tasks++;
kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(this_record_map[j].task);
td->td_parent = parent_task;
@@ -5471,8 +5474,13 @@ void __kmp_end_record(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
kmp_int32 this_map_size = tdg->map_size;
kmp_int32 this_num_roots = 0;
kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_int32 tasks = 0;
- for (kmp_int32 i = 0; i < this_num_tasks; i++) {
+ for (kmp_int32 i = 0; tasks < this_num_tasks && i < this_map_size; i++) {
+ if (this_record_map[i].task == nullptr) {
+ continue;
+ }
+ tasks++;
if (this_record_map[i].npredecessors == 0) {
this_root_tasks[this_num_roots++] = i;
}
>From dee4d76911ce46e50cc09ad0dae40ce91f69632e Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Wed, 3 Sep 2025 10:10:41 +0200
Subject: [PATCH 10/28] [openmp] Remove taskgraph successors alloc in node
allocation
---
openmp/runtime/src/kmp_taskdeps.cpp | 18 ++++++++++--------
openmp/runtime/src/kmp_tasking.cpp | 15 ++++++---------
2 files changed, 16 insertions(+), 17 deletions(-)
diff --git a/openmp/runtime/src/kmp_taskdeps.cpp b/openmp/runtime/src/kmp_taskdeps.cpp
index b1a0848fc722f..850d1e32b106f 100644
--- a/openmp/runtime/src/kmp_taskdeps.cpp
+++ b/openmp/runtime/src/kmp_taskdeps.cpp
@@ -244,13 +244,17 @@ static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,
if (!exists) {
if (source_info->nsuccessors >= source_info->successors_size) {
kmp_uint old_size = source_info->successors_size;
- source_info->successors_size = 2 * source_info->successors_size;
+ source_info->successors_size = old_size == 0
+ ? __kmp_successors_size
+ : 2 * source_info->successors_size;
kmp_int32 *old_succ_ids = source_info->successors;
kmp_int32 *new_succ_ids = (kmp_int32 *)__kmp_allocate(
source_info->successors_size * sizeof(kmp_int32));
- KMP_MEMCPY(new_succ_ids, old_succ_ids, old_size * sizeof(kmp_int32));
+ if (old_succ_ids) {
+ KMP_MEMCPY(new_succ_ids, old_succ_ids, old_size * sizeof(kmp_int32));
+ __kmp_free(old_succ_ids);
+ }
source_info->successors = new_succ_ids;
- __kmp_free(old_succ_ids);
}
source_info->successors[source_info->nsuccessors] =
@@ -714,14 +718,12 @@ kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,
__kmp_free(old_record);
- for (kmp_uint i = old_size; i < new_size; i++) {
- kmp_int32 *successorsList = (kmp_int32 *)__kmp_allocate(
- __kmp_successors_size * sizeof(kmp_int32));
+ for (kmp_int i = old_size; i < new_size; i++) {
new_record[i].task = nullptr;
- new_record[i].successors = successorsList;
+ new_record[i].successors = nullptr;
new_record[i].nsuccessors = 0;
new_record[i].npredecessors = 0;
- new_record[i].successors_size = __kmp_successors_size;
+ new_record[i].successors_size = 0;
KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
}
// update the size at the end, so that we avoid other
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index 7f138b267a4b2..ea7460c6ba4cb 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1828,14 +1828,12 @@ kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
__kmp_free(old_record);
- for (kmp_uint i = old_size; i < new_size; i++) {
- kmp_int32 *successorsList = (kmp_int32 *)__kmp_allocate(
- __kmp_successors_size * sizeof(kmp_int32));
+ for (kmp_int i = old_size; i < new_size; i++) {
new_record[i].task = nullptr;
- new_record[i].successors = successorsList;
+ new_record[i].successors = nullptr;
new_record[i].nsuccessors = 0;
new_record[i].npredecessors = 0;
- new_record[i].successors_size = __kmp_successors_size;
+ new_record[i].successors_size = 0;
KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
}
// update the size at the end, so that we avoid other
@@ -5403,13 +5401,12 @@ static inline void __kmp_start_record(kmp_int32 gtid,
kmp_node_info_t *this_record_map =
(kmp_node_info_t *)__kmp_allocate(INIT_MAPSIZE * sizeof(kmp_node_info_t));
for (kmp_int32 i = 0; i < INIT_MAPSIZE; i++) {
- kmp_int32 *successorsList =
- (kmp_int32 *)__kmp_allocate(__kmp_successors_size * sizeof(kmp_int32));
this_record_map[i].task = nullptr;
- this_record_map[i].successors = successorsList;
+ this_record_map[i].parent_task = nullptr;
+ this_record_map[i].successors = nullptr;
this_record_map[i].nsuccessors = 0;
this_record_map[i].npredecessors = 0;
- this_record_map[i].successors_size = __kmp_successors_size;
+ this_record_map[i].successors_size = 0;
KMP_ATOMIC_ST_RLX(&this_record_map[i].npredecessors_counter, 0);
}
>From 12e13443b8579b528cc9808de5630dd744ebeeef Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Wed, 3 Sep 2025 10:12:19 +0200
Subject: [PATCH 11/28] [openmp] Add tdg node parent_task init to nullptr
---
openmp/runtime/src/kmp_taskdeps.cpp | 1 +
openmp/runtime/src/kmp_tasking.cpp | 1 +
2 files changed, 2 insertions(+)
diff --git a/openmp/runtime/src/kmp_taskdeps.cpp b/openmp/runtime/src/kmp_taskdeps.cpp
index 850d1e32b106f..c6853b3dc599f 100644
--- a/openmp/runtime/src/kmp_taskdeps.cpp
+++ b/openmp/runtime/src/kmp_taskdeps.cpp
@@ -720,6 +720,7 @@ kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,
for (kmp_int i = old_size; i < new_size; i++) {
new_record[i].task = nullptr;
+ new_record[i].parent_task = nullptr;
new_record[i].successors = nullptr;
new_record[i].nsuccessors = 0;
new_record[i].npredecessors = 0;
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index ea7460c6ba4cb..e9a933d83bcf7 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1830,6 +1830,7 @@ kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
for (kmp_int i = old_size; i < new_size; i++) {
new_record[i].task = nullptr;
+ new_record[i].parent_task = nullptr;
new_record[i].successors = nullptr;
new_record[i].nsuccessors = 0;
new_record[i].npredecessors = 0;
>From b41b45bfb18cd287f7e0a5978598ed2b95486227 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Tue, 9 Sep 2025 10:09:47 +0200
Subject: [PATCH 12/28] [openmp] Fix locking when expanding recorded tdg
---
openmp/runtime/src/kmp_taskdeps.cpp | 50 ++++++++++++++---------------
openmp/runtime/src/kmp_tasking.cpp | 8 ++---
2 files changed, 27 insertions(+), 31 deletions(-)
diff --git a/openmp/runtime/src/kmp_taskdeps.cpp b/openmp/runtime/src/kmp_taskdeps.cpp
index c6853b3dc599f..0b737edf2b1c0 100644
--- a/openmp/runtime/src/kmp_taskdeps.cpp
+++ b/openmp/runtime/src/kmp_taskdeps.cpp
@@ -704,39 +704,37 @@ kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,
__kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
kmp_tdg_info_t *tdg = new_taskdata->tdg;
// extend record_map if needed
+ __kmp_acquire_bootstrap_lock(&tdg->graph_lock);
if (new_taskdata->td_tdg_task_id >= tdg->map_size) {
- __kmp_acquire_bootstrap_lock(&tdg->graph_lock);
- if (new_taskdata->td_tdg_task_id >= tdg->map_size) {
- kmp_uint old_size = tdg->map_size;
- kmp_uint new_size = old_size * 2;
- kmp_node_info_t *old_record = tdg->record_map;
- kmp_node_info_t *new_record = (kmp_node_info_t *)__kmp_allocate(
- new_size * sizeof(kmp_node_info_t));
- KMP_MEMCPY(new_record, tdg->record_map,
- old_size * sizeof(kmp_node_info_t));
- tdg->record_map = new_record;
-
- __kmp_free(old_record);
-
- for (kmp_int i = old_size; i < new_size; i++) {
- new_record[i].task = nullptr;
- new_record[i].parent_task = nullptr;
- new_record[i].successors = nullptr;
- new_record[i].nsuccessors = 0;
- new_record[i].npredecessors = 0;
- new_record[i].successors_size = 0;
- KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
- }
- // update the size at the end, so that we avoid other
- // threads use old_record while map_size is already updated
- tdg->map_size = new_size;
+ kmp_uint old_size = tdg->map_size;
+ kmp_uint new_size = old_size * 2;
+ kmp_node_info_t *old_record = tdg->record_map;
+ kmp_node_info_t *new_record =
+ (kmp_node_info_t *)__kmp_allocate(new_size * sizeof(kmp_node_info_t));
+ KMP_MEMCPY(new_record, tdg->record_map,
+ old_size * sizeof(kmp_node_info_t));
+ tdg->record_map = new_record;
+
+ __kmp_free(old_record);
+
+ for (kmp_int i = old_size; i < new_size; i++) {
+ new_record[i].task = nullptr;
+ new_record[i].parent_task = nullptr;
+ new_record[i].successors = nullptr;
+ new_record[i].nsuccessors = 0;
+ new_record[i].npredecessors = 0;
+ new_record[i].successors_size = 0;
+ KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
}
- __kmp_release_bootstrap_lock(&tdg->graph_lock);
+ // update the size at the end, so that we avoid other
+ // threads use old_record while map_size is already updated
+ tdg->map_size = new_size;
}
tdg->record_map[new_taskdata->td_tdg_task_id].task = new_task;
tdg->record_map[new_taskdata->td_tdg_task_id].parent_task =
new_taskdata->td_parent;
KMP_ATOMIC_INC(&tdg->num_tasks);
+ __kmp_release_bootstrap_lock(&tdg->graph_lock);
}
#endif
#if OMPT_SUPPORT
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index e9a933d83bcf7..088d6f1a019ee 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1812,7 +1812,8 @@ kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
__kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
kmp_tdg_info_t *tdg = new_taskdata->tdg;
// extend the record_map if needed
- if (new_taskdata->td_tdg_task_id >= new_taskdata->tdg->map_size) {
+ if (new_taskdata->td_tdg_task_id >= tdg->map_size ||
+ tdg->record_map[new_taskdata->td_tdg_task_id].task == nullptr) {
__kmp_acquire_bootstrap_lock(&tdg->graph_lock);
// map_size could have been updated by another thread if recursive
// taskloop
@@ -1841,14 +1842,11 @@ kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
// threads use old_record while map_size is already updated
tdg->map_size = new_size;
}
- __kmp_release_bootstrap_lock(&tdg->graph_lock);
- }
- // record a task
- if (tdg->record_map[new_taskdata->td_tdg_task_id].task == nullptr) {
tdg->record_map[new_taskdata->td_tdg_task_id].task = new_task;
tdg->record_map[new_taskdata->td_tdg_task_id].parent_task =
new_taskdata->td_parent;
KMP_ATOMIC_INC(&tdg->num_tasks);
+ __kmp_release_bootstrap_lock(&tdg->graph_lock);
}
}
#endif
>From f2182305bee8041c7697fb4162b6f949cf9d8b6d Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Tue, 9 Sep 2025 10:29:50 +0200
Subject: [PATCH 13/28] [openmp] Add locking when extending tdg task successors
---
openmp/runtime/src/kmp_taskdeps.cpp | 3 +++
1 file changed, 3 insertions(+)
diff --git a/openmp/runtime/src/kmp_taskdeps.cpp b/openmp/runtime/src/kmp_taskdeps.cpp
index 0b737edf2b1c0..b34191036c528 100644
--- a/openmp/runtime/src/kmp_taskdeps.cpp
+++ b/openmp/runtime/src/kmp_taskdeps.cpp
@@ -225,6 +225,7 @@ static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,
#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_taskdata_t *task_source = KMP_TASK_TO_TASKDATA(source->dn.task);
kmp_taskdata_t *task_sink = KMP_TASK_TO_TASKDATA(sink_task);
+ kmp_tdg_info_t *tdg = task_source->tdg;
if (source->dn.task && sink_task) {
// Not supporting dependency between two tasks that one is within the TDG
// and the other is not
@@ -242,6 +243,7 @@ static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,
}
}
if (!exists) {
+ __kmp_acquire_bootstrap_lock(&tdg->graph_lock);
if (source_info->nsuccessors >= source_info->successors_size) {
kmp_uint old_size = source_info->successors_size;
source_info->successors_size = old_size == 0
@@ -264,6 +266,7 @@ static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,
kmp_node_info_t *sink_info =
&(task_sink->tdg->record_map[task_sink->td_tdg_task_id]);
sink_info->npredecessors++;
+ __kmp_release_bootstrap_lock(&tdg->graph_lock);
}
}
#endif
>From cab0719b09864c4f8b8cc7f1326bc4112061e441 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 17 Mar 2025 11:33:24 +0100
Subject: [PATCH 14/28] [wip][llvm][clang] Add frontend support for OpenMP
graph_id clause
---
clang/include/clang/AST/OpenMPClause.h | 44 +++++++++++++++++++
clang/include/clang/AST/RecursiveASTVisitor.h | 7 +++
clang/include/clang/Sema/SemaOpenMP.h | 4 ++
clang/lib/AST/OpenMPClause.cpp | 18 ++++++++
clang/lib/AST/StmtProfile.cpp | 5 +++
clang/lib/Basic/OpenMPKinds.cpp | 2 +
clang/lib/Parse/ParseOpenMP.cpp | 1 +
clang/lib/Sema/SemaOpenMP.cpp | 40 +++++++++++++++++
clang/lib/Sema/TreeTransform.h | 21 +++++++++
clang/lib/Serialization/ASTReader.cpp | 9 ++++
clang/lib/Serialization/ASTWriter.cpp | 6 +++
clang/tools/libclang/CIndex.cpp | 3 ++
llvm/include/llvm/Frontend/OpenMP/OMP.td | 1 +
13 files changed, 161 insertions(+)
diff --git a/clang/include/clang/AST/OpenMPClause.h b/clang/include/clang/AST/OpenMPClause.h
index af5d3f4698eda..334632cdfdf6f 100644
--- a/clang/include/clang/AST/OpenMPClause.h
+++ b/clang/include/clang/AST/OpenMPClause.h
@@ -8440,6 +8440,50 @@ class OMPIsDevicePtrClause final
}
};
+/// This represents clause 'graph_id' in the '#pragma omp taskgraph"
+/// directives.
+///
+/// \code
+/// #pragma omp taskgraph graph_id(a)
+class OMPGraphIdClause final
+ : public OMPOneStmtClause<llvm::omp::OMPC_graph_id, OMPClause>,
+ public OMPClauseWithPreInit {
+ friend class OMPClauseReader;
+
+ /// Set condition.
+ void setCondition(Expr *Cond) { setStmt(Cond); }
+
+public:
+ /// Build 'grpah_id' clause with condition \a Cond.
+ ///
+ /// \param Cond Condition of the clause.
+ /// \param HelperCond Helper condition for the construct.
+ /// \param CaptureRegion Innermost OpenMP region where expressions in this
+ /// clause must be captured.
+ /// \param StartLoc Starting location of the clause.
+ /// \param LParenLoc Location of '('.
+ /// \param EndLoc Ending location of the clause.
+ OMPGraphIdClause(Expr *Cond, Stmt *HelperCond,
+ OpenMPDirectiveKind CaptureRegion, SourceLocation StartLoc,
+ SourceLocation LParenLoc, SourceLocation EndLoc)
+ : OMPOneStmtClause(Cond, StartLoc, LParenLoc, EndLoc),
+ OMPClauseWithPreInit(this) {
+ setPreInitStmt(HelperCond, CaptureRegion);
+ }
+
+ /// Build an empty clause.
+ OMPGraphIdClause() : OMPOneStmtClause(), OMPClauseWithPreInit(this) {}
+
+ /// Returns condition.
+ Expr *getCondition() const { return getStmtAs<Expr>(); }
+
+ child_range used_children();
+ const_child_range used_children() const {
+ auto Children = const_cast<OMPGraphIdClause *>(this)->used_children();
+ return const_child_range(Children.begin(), Children.end());
+ }
+};
+
/// This represents clause 'has_device_ptr' in the '#pragma omp ...'
/// directives.
///
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index 612b35d615fc0..7f76313b4a653 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -4126,6 +4126,13 @@ bool RecursiveASTVisitor<Derived>::VisitOMPIsDevicePtrClause(
return true;
}
+template <typename Derived>
+bool RecursiveASTVisitor<Derived>::VisitOMPGraphIdClause(OMPGraphIdClause *C) {
+ TRY_TO(VisitOMPClauseWithPreInit(C));
+ TRY_TO(TraverseStmt(C->getCondition()));
+ return true;
+}
+
template <typename Derived>
bool RecursiveASTVisitor<Derived>::VisitOMPHasDeviceAddrClause(
OMPHasDeviceAddrClause *C) {
diff --git a/clang/include/clang/Sema/SemaOpenMP.h b/clang/include/clang/Sema/SemaOpenMP.h
index cf31acb67863f..8f948c63cd47d 100644
--- a/clang/include/clang/Sema/SemaOpenMP.h
+++ b/clang/include/clang/Sema/SemaOpenMP.h
@@ -943,6 +943,10 @@ class SemaOpenMP : public SemaBase {
ActOnOpenMPOrderedClause(SourceLocation StartLoc, SourceLocation EndLoc,
SourceLocation LParenLoc = SourceLocation(),
Expr *NumForLoops = nullptr);
+ /// Called on well-formed 'graph_id' clause.
+ OMPClause *ActOnOpenMPGraphIdClause(Expr *Condition, SourceLocation StartLoc,
+ SourceLocation LParenLoc,
+ SourceLocation EndLoc);
/// Called on well-formed 'grainsize' clause.
OMPClause *ActOnOpenMPGrainsizeClause(OpenMPGrainsizeClauseModifier Modifier,
Expr *Size, SourceLocation StartLoc,
diff --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp
index d4826c3c6edca..eadff79f0d646 100644
--- a/clang/lib/AST/OpenMPClause.cpp
+++ b/clang/lib/AST/OpenMPClause.cpp
@@ -91,6 +91,8 @@ const OMPClauseWithPreInit *OMPClauseWithPreInit::get(const OMPClause *C) {
return static_cast<const OMPDeviceClause *>(C);
case OMPC_grainsize:
return static_cast<const OMPGrainsizeClause *>(C);
+ case OMPC_graph_id:
+ return static_cast<const OMPGraphIdClause *>(C);
case OMPC_num_tasks:
return static_cast<const OMPNumTasksClause *>(C);
case OMPC_final:
@@ -252,6 +254,7 @@ const OMPClauseWithPostUpdate *OMPClauseWithPostUpdate::get(const OMPClause *C)
case OMPC_thread_limit:
case OMPC_priority:
case OMPC_grainsize:
+ case OMPC_graph_id:
case OMPC_nogroup:
case OMPC_num_tasks:
case OMPC_hint:
@@ -326,6 +329,12 @@ OMPClause::child_range OMPGrainsizeClause::used_children() {
return child_range(&Grainsize, &Grainsize + 1);
}
+OMPClause::child_range OMPGraphIdClause::used_children() {
+ if (Stmt **C = getAddrOfExprAsWritten(getPreInitStmt()))
+ return child_range(C, C + 1);
+ return children();
+}
+
OMPClause::child_range OMPNumTasksClause::used_children() {
if (Stmt **C = getAddrOfExprAsWritten(getPreInitStmt()))
return child_range(C, C + 1);
@@ -2334,6 +2343,15 @@ void OMPClausePrinter::VisitOMPGrainsizeClause(OMPGrainsizeClause *Node) {
OS << ")";
}
+void OMPClausePrinter::VisitOMPGraphIdClause(OMPGraphIdClause *Node) {
+ OS << "graphId";
+ if (Expr *E = Node->getCondition()) {
+ OS << "(";
+ E->printPretty(OS, nullptr, Policy, 0);
+ OS << ")";
+ }
+}
+
void OMPClausePrinter::VisitOMPNumTasksClause(OMPNumTasksClause *Node) {
OS << "num_tasks(";
OpenMPNumTasksClauseModifier Modifier = Node->getModifier();
diff --git a/clang/lib/AST/StmtProfile.cpp b/clang/lib/AST/StmtProfile.cpp
index dff89c9085e6a..7b3a02be0fd5f 100644
--- a/clang/lib/AST/StmtProfile.cpp
+++ b/clang/lib/AST/StmtProfile.cpp
@@ -910,6 +910,11 @@ void OMPClauseProfiler::VisitOMPGrainsizeClause(const OMPGrainsizeClause *C) {
if (C->getGrainsize())
Profiler->VisitStmt(C->getGrainsize());
}
+void OMPClauseProfiler::VisitOMPGraphIdClause(const OMPGraphIdClause *C) {
+ VistOMPClauseWithPreInit(C);
+ if (C->getCondition())
+ Profiler->VisitStmt(C->getCondition());
+}
void OMPClauseProfiler::VisitOMPNumTasksClause(const OMPNumTasksClause *C) {
VisitOMPClauseWithPreInit(C);
if (C->getNumTasks())
diff --git a/clang/lib/Basic/OpenMPKinds.cpp b/clang/lib/Basic/OpenMPKinds.cpp
index fd1ffa59abaac..e9a6b90046ed9 100644
--- a/clang/lib/Basic/OpenMPKinds.cpp
+++ b/clang/lib/Basic/OpenMPKinds.cpp
@@ -311,6 +311,7 @@ unsigned clang::getOpenMPSimpleClauseType(OpenMPClauseKind Kind, StringRef Str,
case OMPC_when:
case OMPC_append_args:
case OMPC_looprange:
+ case OMPC_graph_id:
break;
default:
break;
@@ -690,6 +691,7 @@ const char *clang::getOpenMPSimpleClauseTypeName(OpenMPClauseKind Kind,
case OMPC_when:
case OMPC_append_args:
case OMPC_looprange:
+ case OMPC_graph_id:
break;
default:
break;
diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index 29397d67b5bcc..813476624318c 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -3195,6 +3195,7 @@ OMPClause *Parser::ParseOpenMPClause(OpenMPDirectiveKind DKind,
case OMPC_partial:
case OMPC_align:
case OMPC_message:
+ case OMPC_graph_id:
case OMPC_ompx_dyn_cgroup_mem:
case OMPC_dyn_groupprivate:
case OMPC_transparent:
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index 3f54fea2bf78b..3ef2751654d04 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -6793,6 +6793,7 @@ StmtResult SemaOpenMP::ActOnOpenMPExecutableDirective(
case OMPC_final:
case OMPC_priority:
case OMPC_novariants:
+ case OMPC_graph_id:
case OMPC_nocontext:
// Do not analyze if no parent parallel directive.
if (isOpenMPParallelDirective(Kind))
@@ -16615,6 +16616,9 @@ OMPClause *SemaOpenMP::ActOnOpenMPSingleExprClause(OpenMPClauseKind Kind,
case OMPC_detach:
Res = ActOnOpenMPDetachClause(Expr, StartLoc, LParenLoc, EndLoc);
break;
+ case OMPC_graph_id:
+ Res = ActOnOpenMPGraphIdClause(Expr, StartLoc, LParenLoc, EndLoc);
+ break;
case OMPC_novariants:
Res = ActOnOpenMPNovariantsClause(Expr, StartLoc, LParenLoc, EndLoc);
break;
@@ -17403,6 +17407,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPSimpleClause(
case OMPC_match:
case OMPC_nontemporal:
case OMPC_destroy:
+ case OMPC_graph_id:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
@@ -18109,6 +18114,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPSingleExprWithArgClause(
case OMPC_severity:
case OMPC_message:
case OMPC_destroy:
+ case OMPC_graph_id:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
@@ -18382,6 +18388,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPClause(OpenMPClauseKind Kind,
case OMPC_at:
case OMPC_severity:
case OMPC_message:
+ case OMPC_graph_id:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
@@ -18739,6 +18746,38 @@ OMPClause *SemaOpenMP::ActOnOpenMPDestroyClause(Expr *InteropVar,
OMPDestroyClause(InteropVar, StartLoc, LParenLoc, VarLoc, EndLoc);
}
+OMPClause *SemaOpenMP::ActOnOpenMPGraphIdClause(Expr *Condition,
+ SourceLocation StartLoc,
+ SourceLocation LParenLoc,
+ SourceLocation EndLoc) {
+ Expr *ValExpr = Condition;
+ Stmt *HelperValStmt = nullptr;
+ OpenMPDirectiveKind CaptureRegion = OMPD_unknown;
+ if (!Condition->isValueDependent() && !Condition->isTypeDependent() &&
+ !Condition->isInstantiationDependent() &&
+ !Condition->containsUnexpandedParameterPack()) {
+ ExprResult Val = SemaRef.CheckBooleanCondition(StartLoc, Condition);
+ if (Val.isInvalid())
+ return nullptr;
+
+ ValExpr = SemaRef.MakeFullExpr(Val.get()).get();
+
+ OpenMPDirectiveKind DKind = DSAStack->getCurrentDirective();
+ CaptureRegion = getOpenMPCaptureRegionForClause(DKind, OMPC_graph_id,
+ getLangOpts().OpenMP);
+ if (CaptureRegion != OMPD_unknown &&
+ !SemaRef.CurContext->isDependentContext()) {
+ ValExpr = SemaRef.MakeFullExpr(ValExpr).get();
+ llvm::MapVector<const Expr *, DeclRefExpr *> Captures;
+ ValExpr = tryBuildCapture(SemaRef, ValExpr, Captures).get();
+ HelperValStmt = buildPreInits(getASTContext(), Captures);
+ }
+ }
+
+ return new (getASTContext()) OMPGraphIdClause(
+ ValExpr, HelperValStmt, CaptureRegion, StartLoc, LParenLoc, EndLoc);
+}
+
OMPClause *SemaOpenMP::ActOnOpenMPNovariantsClause(Expr *Condition,
SourceLocation StartLoc,
SourceLocation LParenLoc,
@@ -19039,6 +19078,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPVarListClause(OpenMPClauseKind Kind,
case OMPC_severity:
case OMPC_message:
case OMPC_destroy:
+ case OMPC_graph_id:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index fe6b9a4755e04..2f9f39d7f6066 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -2163,6 +2163,17 @@ class TreeTransform {
LParenLoc, EndLoc);
}
+ /// Build a new OpenMP 'graph_id' clause.
+ ///
+ /// By default, performs semantic analysis to build the new OpenMP clause.
+ /// Subclasses may override this routine to provide different behavior.
+ OMPClause *RebuildOMPGraphIdClause(Expr *Condition, SourceLocation StartLoc,
+ SourceLocation LParenLoc,
+ SourceLocation EndLoc) {
+ return getSema().OpenMP().ActOnOpenMPGraphIdClause(Condition, StartLoc,
+ LParenLoc, EndLoc);
+ }
+
/// Build a new OpenMP 'grainsize' clause.
///
/// By default, performs semantic analysis to build the new statement.
@@ -11577,6 +11588,16 @@ TreeTransform<Derived>::TransformOMPPriorityClause(OMPPriorityClause *C) {
E.get(), C->getBeginLoc(), C->getLParenLoc(), C->getEndLoc());
}
+template <typename Derived>
+OMPClause *
+TreeTransform<Derived>::TransformOMPGraphIdClause(OMPGraphIdClause *C) {
+ ExprResult Cond = getDerived().TransformExpr(C->getCondition());
+ if (Cond.isInvalid())
+ return nullptr;
+ return getDerived().RebuildOMPGraphIdClause(
+ Cond.get(), C->getBeginLoc(), C->getLParenLoc(), C->getEndLoc());
+}
+
template <typename Derived>
OMPClause *
TreeTransform<Derived>::TransformOMPGrainsizeClause(OMPGrainsizeClause *C) {
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 515eaf8d1caed..98359169f7ff2 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -11606,6 +11606,9 @@ OMPClause *OMPClauseReader::readClause() {
case llvm::omp::OMPC_grainsize:
C = new (Context) OMPGrainsizeClause();
break;
+ case llvm::omp::OMPC_graph_id:
+ C = new (Context) OMPGraphIdClause();
+ break;
case llvm::omp::OMPC_num_tasks:
C = new (Context) OMPNumTasksClause();
break;
@@ -12499,6 +12502,12 @@ void OMPClauseReader::VisitOMPPriorityClause(OMPPriorityClause *C) {
C->setLParenLoc(Record.readSourceLocation());
}
+void OMPClauseReader::VisitOMPGraphIdClause(OMPGraphIdClause *C) {
+ VisitOMPClauseWithPreInit(C);
+ C->setCondition(Record.readSubExpr());
+ C->setLParenLoc(Record.readSourceLocation());
+}
+
void OMPClauseReader::VisitOMPGrainsizeClause(OMPGrainsizeClause *C) {
VisitOMPClauseWithPreInit(C);
C->setModifier(Record.readEnum<OpenMPGrainsizeClauseModifier>());
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index e21a86b688dbf..09867f6f034c8 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -8508,6 +8508,12 @@ void OMPClauseWriter::VisitOMPPriorityClause(OMPPriorityClause *C) {
Record.AddSourceLocation(C->getLParenLoc());
}
+void OMPClauseWriter::VisitOMPGraphIdClause(OMPGraphIdClause *C) {
+ VisitOMPClauseWithPreInit(C);
+ Record.AddStmt(C->getCondition());
+ Record.AddSourceLocation(C->getLParenLoc());
+}
+
void OMPClauseWriter::VisitOMPGrainsizeClause(OMPGrainsizeClause *C) {
VisitOMPClauseWithPreInit(C);
Record.writeEnum(C->getModifier());
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index a2f166d8ff78d..97bf69bfd5616 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -2750,6 +2750,9 @@ void OMPClauseEnqueue::VisitOMPIsDevicePtrClause(
const OMPIsDevicePtrClause *C) {
VisitOMPClauseList(C);
}
+void OMPClauseEnqueue::VisitOMPGraphIdClause(const OMPGraphIdClause *C) {
+ Visitor->AddStmt(C->getCondition());
+}
void OMPClauseEnqueue::VisitOMPHasDeviceAddrClause(
const OMPHasDeviceAddrClause *C) {
VisitOMPClauseList(C);
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index d1dddf76152ec..0a9bd009fcb4a 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -246,6 +246,7 @@ def OMPC_GrainSize : Clause<[Spelling<"grainsize">]> {
];
}
def OMPC_GraphId : Clause<[Spelling<"graph_id">]> {
+ let clangClass = "OMPGraphIdClause";
let flangClass = "OmpGraphIdClause";
}
def OMPC_GraphReset : Clause<[Spelling<"graph_reset">]> {
>From c1331658fd55012fdadfc37f4f849cd303b243f7 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 17 Mar 2025 13:10:22 +0100
Subject: [PATCH 15/28] [wip][llvm][clang] Add frontend support for OpenMP
graph_reset clause
---
clang/include/clang/AST/OpenMPClause.h | 45 +++++++++++++++++++
clang/include/clang/AST/RecursiveASTVisitor.h | 8 ++++
clang/include/clang/Sema/SemaOpenMP.h | 5 +++
clang/lib/AST/OpenMPClause.cpp | 18 ++++++++
clang/lib/AST/StmtProfile.cpp | 5 +++
clang/lib/Basic/OpenMPKinds.cpp | 2 +
clang/lib/Parse/ParseOpenMP.cpp | 1 +
clang/lib/Sema/SemaOpenMP.cpp | 40 +++++++++++++++++
clang/lib/Sema/TreeTransform.h | 22 +++++++++
clang/lib/Serialization/ASTReader.cpp | 6 +++
clang/lib/Serialization/ASTWriter.cpp | 6 +++
clang/tools/libclang/CIndex.cpp | 3 ++
llvm/include/llvm/Frontend/OpenMP/OMP.td | 1 +
13 files changed, 162 insertions(+)
diff --git a/clang/include/clang/AST/OpenMPClause.h b/clang/include/clang/AST/OpenMPClause.h
index 334632cdfdf6f..27a737bd43633 100644
--- a/clang/include/clang/AST/OpenMPClause.h
+++ b/clang/include/clang/AST/OpenMPClause.h
@@ -8484,6 +8484,51 @@ class OMPGraphIdClause final
}
};
+// This represents clause 'graph_reset' in the '#pragma omp taskgraph"
+/// directives.
+///
+/// \code
+/// #pragma omp taskgraph graph_reset(true)
+class OMPGraphResetClause final
+ : public OMPOneStmtClause<llvm::omp::OMPC_graph_reset, OMPClause>,
+ public OMPClauseWithPreInit {
+ friend class OMPClauseReader;
+
+ /// Set condition.
+ void setCondition(Expr *Cond) { setStmt(Cond); }
+
+public:
+ /// Build 'grpah_id' clause with condition \a Cond.
+ ///
+ /// \param Cond Condition of the clause.
+ /// \param HelperCond Helper condition for the construct.
+ /// \param CaptureRegion Innermost OpenMP region where expressions in this
+ /// clause must be captured.
+ /// \param StartLoc Starting location of the clause.
+ /// \param LParenLoc Location of '('.
+ /// \param EndLoc Ending location of the clause.
+ OMPGraphResetClause(Expr *Cond, Stmt *HelperCond,
+ OpenMPDirectiveKind CaptureRegion,
+ SourceLocation StartLoc, SourceLocation LParenLoc,
+ SourceLocation EndLoc)
+ : OMPOneStmtClause(Cond, StartLoc, LParenLoc, EndLoc),
+ OMPClauseWithPreInit(this) {
+ setPreInitStmt(HelperCond, CaptureRegion);
+ }
+
+ /// Build an empty clause.
+ OMPGraphResetClause() : OMPOneStmtClause(), OMPClauseWithPreInit(this) {}
+
+ /// Returns condition.
+ Expr *getCondition() const { return getStmtAs<Expr>(); }
+
+ child_range used_children();
+ const_child_range used_children() const {
+ auto Children = const_cast<OMPGraphResetClause *>(this)->used_children();
+ return const_child_range(Children.begin(), Children.end());
+ }
+};
+
/// This represents clause 'has_device_ptr' in the '#pragma omp ...'
/// directives.
///
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index 7f76313b4a653..32b928ca62fd5 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -4133,6 +4133,14 @@ bool RecursiveASTVisitor<Derived>::VisitOMPGraphIdClause(OMPGraphIdClause *C) {
return true;
}
+template <typename Derived>
+bool RecursiveASTVisitor<Derived>::VisitOMPGraphResetClause(
+ OMPGraphResetClause *C) {
+ TRY_TO(VisitOMPClauseWithPreInit(C));
+ TRY_TO(TraverseStmt(C->getCondition()));
+ return true;
+}
+
template <typename Derived>
bool RecursiveASTVisitor<Derived>::VisitOMPHasDeviceAddrClause(
OMPHasDeviceAddrClause *C) {
diff --git a/clang/include/clang/Sema/SemaOpenMP.h b/clang/include/clang/Sema/SemaOpenMP.h
index 8f948c63cd47d..d88a85cc1b9f5 100644
--- a/clang/include/clang/Sema/SemaOpenMP.h
+++ b/clang/include/clang/Sema/SemaOpenMP.h
@@ -947,6 +947,11 @@ class SemaOpenMP : public SemaBase {
OMPClause *ActOnOpenMPGraphIdClause(Expr *Condition, SourceLocation StartLoc,
SourceLocation LParenLoc,
SourceLocation EndLoc);
+ /// Called on well-formed 'graph_reset' clause.
+ OMPClause *ActOnOpenMPGraphResetClause(Expr *Condition,
+ SourceLocation StartLoc,
+ SourceLocation LParenLoc,
+ SourceLocation EndLoc);
/// Called on well-formed 'grainsize' clause.
OMPClause *ActOnOpenMPGrainsizeClause(OpenMPGrainsizeClauseModifier Modifier,
Expr *Size, SourceLocation StartLoc,
diff --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp
index eadff79f0d646..3765b97447e61 100644
--- a/clang/lib/AST/OpenMPClause.cpp
+++ b/clang/lib/AST/OpenMPClause.cpp
@@ -93,6 +93,8 @@ const OMPClauseWithPreInit *OMPClauseWithPreInit::get(const OMPClause *C) {
return static_cast<const OMPGrainsizeClause *>(C);
case OMPC_graph_id:
return static_cast<const OMPGraphIdClause *>(C);
+ case OMPC_graph_reset:
+ return static_cast<const OMPGraphResetClause *>(C);
case OMPC_num_tasks:
return static_cast<const OMPNumTasksClause *>(C);
case OMPC_final:
@@ -255,6 +257,7 @@ const OMPClauseWithPostUpdate *OMPClauseWithPostUpdate::get(const OMPClause *C)
case OMPC_priority:
case OMPC_grainsize:
case OMPC_graph_id:
+ case OMPC_graph_reset:
case OMPC_nogroup:
case OMPC_num_tasks:
case OMPC_hint:
@@ -335,6 +338,12 @@ OMPClause::child_range OMPGraphIdClause::used_children() {
return children();
}
+OMPClause::child_range OMPGraphResetClause::used_children() {
+ if (Stmt **C = getAddrOfExprAsWritten(getPreInitStmt()))
+ return child_range(C, C + 1);
+ return children();
+}
+
OMPClause::child_range OMPNumTasksClause::used_children() {
if (Stmt **C = getAddrOfExprAsWritten(getPreInitStmt()))
return child_range(C, C + 1);
@@ -2352,6 +2361,15 @@ void OMPClausePrinter::VisitOMPGraphIdClause(OMPGraphIdClause *Node) {
}
}
+void OMPClausePrinter::VisitOMPGraphResetClause(OMPGraphResetClause *Node) {
+ OS << "graphReset";
+ if (Expr *E = Node->getCondition()) {
+ OS << "(";
+ E->printPretty(OS, nullptr, Policy, 0);
+ OS << ")";
+ }
+}
+
void OMPClausePrinter::VisitOMPNumTasksClause(OMPNumTasksClause *Node) {
OS << "num_tasks(";
OpenMPNumTasksClauseModifier Modifier = Node->getModifier();
diff --git a/clang/lib/AST/StmtProfile.cpp b/clang/lib/AST/StmtProfile.cpp
index 7b3a02be0fd5f..09a21bdf1777f 100644
--- a/clang/lib/AST/StmtProfile.cpp
+++ b/clang/lib/AST/StmtProfile.cpp
@@ -915,6 +915,11 @@ void OMPClauseProfiler::VisitOMPGraphIdClause(const OMPGraphIdClause *C) {
if (C->getCondition())
Profiler->VisitStmt(C->getCondition());
}
+void OMPClauseProfiler::VisitOMPGraphResetClause(const OMPGraphResetClause *C) {
+ VistOMPClauseWithPreInit(C);
+ if (C->getCondition())
+ Profiler->VisitStmt(C->getCondition());
+}
void OMPClauseProfiler::VisitOMPNumTasksClause(const OMPNumTasksClause *C) {
VisitOMPClauseWithPreInit(C);
if (C->getNumTasks())
diff --git a/clang/lib/Basic/OpenMPKinds.cpp b/clang/lib/Basic/OpenMPKinds.cpp
index e9a6b90046ed9..81aea29119a6c 100644
--- a/clang/lib/Basic/OpenMPKinds.cpp
+++ b/clang/lib/Basic/OpenMPKinds.cpp
@@ -312,6 +312,7 @@ unsigned clang::getOpenMPSimpleClauseType(OpenMPClauseKind Kind, StringRef Str,
case OMPC_append_args:
case OMPC_looprange:
case OMPC_graph_id:
+ case OMPC_graph_reset:
break;
default:
break;
@@ -692,6 +693,7 @@ const char *clang::getOpenMPSimpleClauseTypeName(OpenMPClauseKind Kind,
case OMPC_append_args:
case OMPC_looprange:
case OMPC_graph_id:
+ case OMPC_graph_reset:
break;
default:
break;
diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index 813476624318c..979d376d438fc 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -3196,6 +3196,7 @@ OMPClause *Parser::ParseOpenMPClause(OpenMPDirectiveKind DKind,
case OMPC_align:
case OMPC_message:
case OMPC_graph_id:
+ case OMPC_graph_reset:
case OMPC_ompx_dyn_cgroup_mem:
case OMPC_dyn_groupprivate:
case OMPC_transparent:
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index 3ef2751654d04..899fc19fbd4bb 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -6794,6 +6794,7 @@ StmtResult SemaOpenMP::ActOnOpenMPExecutableDirective(
case OMPC_priority:
case OMPC_novariants:
case OMPC_graph_id:
+ case OMPC_graph_reset:
case OMPC_nocontext:
// Do not analyze if no parent parallel directive.
if (isOpenMPParallelDirective(Kind))
@@ -16619,6 +16620,9 @@ OMPClause *SemaOpenMP::ActOnOpenMPSingleExprClause(OpenMPClauseKind Kind,
case OMPC_graph_id:
Res = ActOnOpenMPGraphIdClause(Expr, StartLoc, LParenLoc, EndLoc);
break;
+ case OMPC_graph_reset:
+ Res = ActOnOpenMPGraphResetClause(Expr, StartLoc, LParenLoc, EndLoc);
+ break;
case OMPC_novariants:
Res = ActOnOpenMPNovariantsClause(Expr, StartLoc, LParenLoc, EndLoc);
break;
@@ -17408,6 +17412,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPSimpleClause(
case OMPC_nontemporal:
case OMPC_destroy:
case OMPC_graph_id:
+ case OMPC_graph_reset:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
@@ -18115,6 +18120,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPSingleExprWithArgClause(
case OMPC_message:
case OMPC_destroy:
case OMPC_graph_id:
+ case OMPC_graph_reset:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
@@ -18389,6 +18395,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPClause(OpenMPClauseKind Kind,
case OMPC_severity:
case OMPC_message:
case OMPC_graph_id:
+ case OMPC_graph_reset:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
@@ -18778,6 +18785,38 @@ OMPClause *SemaOpenMP::ActOnOpenMPGraphIdClause(Expr *Condition,
ValExpr, HelperValStmt, CaptureRegion, StartLoc, LParenLoc, EndLoc);
}
+OMPClause *SemaOpenMP::ActOnOpenMPGraphResetClause(Expr *Condition,
+ SourceLocation StartLoc,
+ SourceLocation LParenLoc,
+ SourceLocation EndLoc) {
+ Expr *ValExpr = Condition;
+ Stmt *HelperValStmt = nullptr;
+ OpenMPDirectiveKind CaptureRegion = OMPD_unknown;
+ if (!Condition->isValueDependent() && !Condition->isTypeDependent() &&
+ !Condition->isInstantiationDependent() &&
+ !Condition->containsUnexpandedParameterPack()) {
+ ExprResult Val = SemaRef.CheckBooleanCondition(StartLoc, Condition);
+ if (Val.isInvalid())
+ return nullptr;
+
+ ValExpr = SemaRef.MakeFullExpr(Val.get()).get();
+
+ OpenMPDirectiveKind DKind = DSAStack->getCurrentDirective();
+ CaptureRegion = getOpenMPCaptureRegionForClause(DKind, OMPC_graph_reset,
+ getLangOpts().OpenMP);
+ if (CaptureRegion != OMPD_unknown &&
+ !SemaRef.CurContext->isDependentContext()) {
+ ValExpr = SemaRef.MakeFullExpr(ValExpr).get();
+ llvm::MapVector<const Expr *, DeclRefExpr *> Captures;
+ ValExpr = tryBuildCapture(SemaRef, ValExpr, Captures).get();
+ HelperValStmt = buildPreInits(getASTContext(), Captures);
+ }
+ }
+
+ return new (getASTContext()) OMPGraphResetClause(
+ ValExpr, HelperValStmt, CaptureRegion, StartLoc, LParenLoc, EndLoc);
+}
+
OMPClause *SemaOpenMP::ActOnOpenMPNovariantsClause(Expr *Condition,
SourceLocation StartLoc,
SourceLocation LParenLoc,
@@ -19079,6 +19118,7 @@ OMPClause *SemaOpenMP::ActOnOpenMPVarListClause(OpenMPClauseKind Kind,
case OMPC_message:
case OMPC_destroy:
case OMPC_graph_id:
+ case OMPC_graph_reset:
case OMPC_novariants:
case OMPC_nocontext:
case OMPC_detach:
diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index 2f9f39d7f6066..766b08929e7fa 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -2174,6 +2174,18 @@ class TreeTransform {
LParenLoc, EndLoc);
}
+ /// Build a new OpenMP 'graph_reset' clause.
+ ///
+ /// By default, performs semantic analysis to build the new OpenMP clause.
+ /// Subclasses may override this routine to provide different behavior.
+ OMPClause *RebuildOMPGraphResetClause(Expr *Condition,
+ SourceLocation StartLoc,
+ SourceLocation LParenLoc,
+ SourceLocation EndLoc) {
+ return getSema().OpenMP().ActOnOpenMPGraphResetClause(Condition, StartLoc,
+ LParenLoc, EndLoc);
+ }
+
/// Build a new OpenMP 'grainsize' clause.
///
/// By default, performs semantic analysis to build the new statement.
@@ -11598,6 +11610,16 @@ TreeTransform<Derived>::TransformOMPGraphIdClause(OMPGraphIdClause *C) {
Cond.get(), C->getBeginLoc(), C->getLParenLoc(), C->getEndLoc());
}
+template <typename Derived>
+OMPClause *
+TreeTransform<Derived>::TransformOMPGraphResetClause(OMPGraphResetClause *C) {
+ ExprResult Cond = getDerived().TransformExpr(C->getCondition());
+ if (Cond.isInvalid())
+ return nullptr;
+ return getDerived().RebuildOMPGraphResetClause(
+ Cond.get(), C->getBeginLoc(), C->getLParenLoc(), C->getEndLoc());
+}
+
template <typename Derived>
OMPClause *
TreeTransform<Derived>::TransformOMPGrainsizeClause(OMPGrainsizeClause *C) {
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 98359169f7ff2..79ab15a09cde7 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -12508,6 +12508,12 @@ void OMPClauseReader::VisitOMPGraphIdClause(OMPGraphIdClause *C) {
C->setLParenLoc(Record.readSourceLocation());
}
+void OMPClauseReader::VisitOMPGraphResetClause(OMPGraphResetClause *C) {
+ VisitOMPClauseWithPreInit(C);
+ C->setCondition(Record.readSubExpr());
+ C->setLParenLoc(Record.readSourceLocation());
+}
+
void OMPClauseReader::VisitOMPGrainsizeClause(OMPGrainsizeClause *C) {
VisitOMPClauseWithPreInit(C);
C->setModifier(Record.readEnum<OpenMPGrainsizeClauseModifier>());
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 09867f6f034c8..9c2aa6632c123 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -8514,6 +8514,12 @@ void OMPClauseWriter::VisitOMPGraphIdClause(OMPGraphIdClause *C) {
Record.AddSourceLocation(C->getLParenLoc());
}
+void OMPClauseWriter::VisitOMPGraphResetClause(OMPGraphResetClause *C) {
+ VisitOMPClauseWithPreInit(C);
+ Record.AddStmt(C->getCondition());
+ Record.AddSourceLocation(C->getLParenLoc());
+}
+
void OMPClauseWriter::VisitOMPGrainsizeClause(OMPGrainsizeClause *C) {
VisitOMPClauseWithPreInit(C);
Record.writeEnum(C->getModifier());
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index 97bf69bfd5616..3af9d481f4b91 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -2753,6 +2753,9 @@ void OMPClauseEnqueue::VisitOMPIsDevicePtrClause(
void OMPClauseEnqueue::VisitOMPGraphIdClause(const OMPGraphIdClause *C) {
Visitor->AddStmt(C->getCondition());
}
+void OMPClauseEnqueue::VisitOMPGraphResetClause(const OMPGraphResetClause *C) {
+ Visitor->AddStmt(C->getCondition());
+}
void OMPClauseEnqueue::VisitOMPHasDeviceAddrClause(
const OMPHasDeviceAddrClause *C) {
VisitOMPClauseList(C);
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 0a9bd009fcb4a..09a899cbf2562 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -250,6 +250,7 @@ def OMPC_GraphId : Clause<[Spelling<"graph_id">]> {
let flangClass = "OmpGraphIdClause";
}
def OMPC_GraphReset : Clause<[Spelling<"graph_reset">]> {
+ let clangClass = "OMPGraphResetClause";
let flangClass = "OmpGraphResetClause";
let isValueOptional = true;
}
>From b8f32eab58408c3eb8eac344dc03d13a0bd860a4 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 15 Sep 2025 08:24:50 +0200
Subject: [PATCH 16/28] [xxx][llvm][clang] Add GraphReset in Taskgraph CodeGen
---
clang/lib/AST/StmtProfile.cpp | 2 +-
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/clang/lib/AST/StmtProfile.cpp b/clang/lib/AST/StmtProfile.cpp
index 09a21bdf1777f..eaf956225a9d1 100644
--- a/clang/lib/AST/StmtProfile.cpp
+++ b/clang/lib/AST/StmtProfile.cpp
@@ -916,7 +916,7 @@ void OMPClauseProfiler::VisitOMPGraphIdClause(const OMPGraphIdClause *C) {
Profiler->VisitStmt(C->getCondition());
}
void OMPClauseProfiler::VisitOMPGraphResetClause(const OMPGraphResetClause *C) {
- VistOMPClauseWithPreInit(C);
+ VisitOMPClauseWithPreInit(C);
if (C->getCondition())
Profiler->VisitStmt(C->getCondition());
}
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 2316d80e511bc..05c40d53160b7 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6136,6 +6136,22 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
unsigned Flags = 0;
+ const OMPGraphResetClause *GraphResetClause =
+ D.getSingleClause<OMPGraphResetClause>();
+ if (GraphResetClause) {
+ const Expr *Cond = GraphResetClause->getCondition();
+ llvm::Value *CondVal = CGF.EvaluateExprAsBool(Cond);
+ if (CondVal) {
+ llvm::Value *CondBool = CGF.Builder.CreateICmpNE(
+ CondVal, llvm::ConstantInt::get(CondVal->getType(), 0));
+ if (llvm::ConstantInt *CI = llvm::dyn_cast<llvm::ConstantInt>(CondBool)) {
+ if (CI->isOne()) {
+ Flags |= ReRecordFlag;
+ }
+ }
+ }
+ }
+
CodeGenFunction OutlinedCGF(CGM, /*suppressNewContext=*/true);
const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
>From b671c20a2d5cfac1d0ce0dee0f3f23fc9474bf8f Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 15 Sep 2025 09:07:18 +0200
Subject: [PATCH 17/28] [xxx][openmp] Add graph_id input to kmpc_taskgraph
---
openmp/runtime/src/kmp.h | 3 ++-
openmp/runtime/src/kmp_tasking.cpp | 3 ++-
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 3a2ab8c94d476..b2b0e39a39c73 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -4401,7 +4401,8 @@ KMP_EXPORT void __kmpc_end_record_task(ident_t *loc, kmp_int32 gtid,
kmp_int32 input_flags, kmp_int32 tdg_id);
KMP_EXPORT void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
kmp_int32 input_flags, kmp_uint32 tdg_id,
- void (*entry)(void *), void *args);
+ kmp_uint32 graph_id, void (*entry)(void *),
+ void *args);
#endif
/* Interface to fast scalable reduce methods routines */
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index 088d6f1a019ee..d2566786ee900 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -5244,7 +5244,8 @@ bool __kmpc_omp_has_task_team(kmp_int32 gtid) {
// entry: Pointer to the entry function
// args: Pointer to the function arguments
void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid, kmp_int32 input_flags,
- kmp_uint32 tdg_id, void (*entry)(void *), void *args) {
+ kmp_uint32 tdg_id, kmp_uint32 graph_id,
+ void (*entry)(void *), void *args) {
kmp_int32 res = __kmpc_start_record_task(loc_ref, gtid, input_flags, tdg_id);
// When res = 1, we either start recording or only execute tasks
// without recording. Need to execute entry function in both cases.
>From a25ba3579b62e73688a18896269df30e75705b93 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 15 Sep 2025 09:13:16 +0200
Subject: [PATCH 18/28] [xxx][llvm][clang] Add GraphId in Taskgraph CodeGen
---
clang/lib/AST/StmtProfile.cpp | 2 +-
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 11 ++++++++++-
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def | 1 +
3 files changed, 12 insertions(+), 2 deletions(-)
diff --git a/clang/lib/AST/StmtProfile.cpp b/clang/lib/AST/StmtProfile.cpp
index eaf956225a9d1..0e43a48e40a9b 100644
--- a/clang/lib/AST/StmtProfile.cpp
+++ b/clang/lib/AST/StmtProfile.cpp
@@ -911,7 +911,7 @@ void OMPClauseProfiler::VisitOMPGrainsizeClause(const OMPGrainsizeClause *C) {
Profiler->VisitStmt(C->getGrainsize());
}
void OMPClauseProfiler::VisitOMPGraphIdClause(const OMPGraphIdClause *C) {
- VistOMPClauseWithPreInit(C);
+ VisitOMPClauseWithPreInit(C);
if (C->getCondition())
Profiler->VisitStmt(C->getCondition());
}
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 05c40d53160b7..56c612ae52745 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6152,6 +6152,14 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
}
}
+ llvm::Value *GraphId = CGF.Builder.getInt32(0);
+ const OMPGraphIdClause *GraphIdClause = D.getSingleClause<OMPGraphIdClause>();
+ if (GraphIdClause) {
+ const auto *E = GraphIdClause->getCondition();
+ auto *GraphIdVal = CGF.EmitScalarExpr(E);
+ GraphId = CGF.Builder.CreateIntCast(GraphIdVal, CGM.Int32Ty, true);
+ }
+
CodeGenFunction OutlinedCGF(CGM, /*suppressNewContext=*/true);
const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
@@ -6166,11 +6174,12 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
&TaskgraphRegion);
llvm::Function *FnT = OutlinedCGF.GenerateCapturedStmtFunction(*CS);
- std::array<llvm::Value *, 6> Args{
+ std::array<llvm::Value *, 7> Args{
emitUpdateLocation(CGF, Loc),
getThreadID(CGF, Loc),
CGF.Builder.getInt32(Flags),
CGF.Builder.getInt32(D.getBeginLoc().getHashValue()),
+ GraphId,
CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(FnT, CGM.VoidPtrTy),
CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy)};
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 430a2b147e2e5..0883d06f1da77 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -358,6 +358,7 @@ __OMP_RTL(__kmpc_omp_task, false, Int32, IdentPtr, Int32,
__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, Int32, Int32, VoidPtr, VoidPtr)
__OMP_RTL(__kmpc_end_taskgroup, false, Void, IdentPtr, Int32)
__OMP_RTL(__kmpc_taskgroup, false, Void, IdentPtr, Int32)
+__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, Int32, Int32, Int32, VoidPtr, VoidPtr)
__OMP_RTL(__kmpc_omp_task_begin_if0, false, Void, IdentPtr, Int32,
/* kmp_task_t */ VoidPtr)
__OMP_RTL(__kmpc_omp_task_complete_if0, false, Void, IdentPtr, Int32,
>From 45e9e040a02bde8d3e14d85863ba1e18a0adef3f Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 15 Sep 2025 17:42:58 +0200
Subject: [PATCH 19/28] [xxx][wip][clang] Add If clause in taskgraph
---
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 111 +++++++++++++++++++++++++-
clang/lib/CodeGen/CGOpenMPRuntime.h | 8 +-
clang/lib/CodeGen/CGStmtOpenMP.cpp | 11 ++-
3 files changed, 123 insertions(+), 7 deletions(-)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 56c612ae52745..79bbca43b72ee 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6124,7 +6124,8 @@ void CGOpenMPRuntime::emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
SourceLocation Loc,
- const OMPExecutableDirective &D) {
+ const OMPExecutableDirective &D,
+ const Expr *IfCond) {
if (!CGF.HaveInsertPoint())
return;
@@ -6136,6 +6137,10 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
unsigned Flags = 0;
+ if (D.getSingleClause<OMPNowaitClause>()) {
+ Flags |= NowaitFlag;
+ }
+
const OMPGraphResetClause *GraphResetClause =
D.getSingleClause<OMPGraphResetClause>();
if (GraphResetClause) {
@@ -6184,9 +6189,97 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy)};
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_taskgraph),
- Args);
+ auto &&ThenGen = [&CGF, this, &Args](CodeGenFunction &, PrePostActionTy &) {
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph),
+ Args);
+ };
+ llvm::Module &M = CGM.getModule();
+ llvm::Value *ThreadID = getThreadID(CGF, Loc);
+ auto &&ElseGen = [&M, ThreadID, &CGF, this, &FnT, &CapStruct, &Loc, &OutlinedCGF]
+ (CodeGenFunction &, PrePostActionTy &) {
+ // This logic is adapted from the if(false) path of a regular task.
+ // It ensures the taskgraph body is executed with minimal runtime overhead.
+
+ // Arguments for the begin/complete calls: ident_t*, gtid, kmp_task_t*
+ // We can pass the captured struct pointer as the "task" handle for the
+ // lightweight if0 calls.
+ /* llvm::Value *CapturedArgsPtr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( */
+ /* CapStruct.getPointer(CGF), CGM.KmpTaskTDefaultTy->getPointer()); */
+ llvm::Value *CapturedArgsPtr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+ CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy);
+
+ std::vector<llvm::Value *> TaskArgs{
+ emitUpdateLocation(CGF, Loc),
+ ThreadID,
+ CapturedArgsPtr
+ };
+
+ // This is the core logic that will execute the outlined function.
+ auto &&CodeGen = [&](CodeGenFunction &CGF, PrePostActionTy &Action) {
+ Action.Enter(CGF);
+ /* llvm::Value *OutlinedFnArgs[] = {ThreadID, CapturedArgsPtr }; */
+ CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc,
+ FnT, // Our outlined taskgraph body
+ CapturedArgsPtr);
+ };
+
+ // We use a RegionCodeGenTy to wrap the core logic with "begin" and "complete"
+ // runtime calls. This is a standard pattern in Clang's OpenMP implementation.
+ RegionCodeGenTy RCG(CodeGen);
+
+ // Set up the pre/post actions using the special 'if0' runtime functions.
+ // These are optimized for the if(false) case.
+ /* CommonActionTy Action( */
+ /* OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_omp_task_begin_if0), */
+ /* TaskArgs, */
+ /* OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_omp_task_complete_if0), */
+ /* TaskArgs); */
+ /* RCG.setAction(Action); */
+ RCG(CGF);
+ };
+
+ /* auto &&ElseCodeGen = [this, &M, &TaskArgs, ThreadID, */
+ /* TaskEntry, &Data, &DepWaitTaskArgs, */
+ /* Loc](CodeGenFunction &CGF, PrePostActionTy &) { */
+ /* CodeGenFunction::RunCleanupsScope LocalScope(CGF); */
+ /* // Build void __kmpc_omp_wait_deps(ident_t *, kmp_int32 gtid, */
+ /* // kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32 */
+ /* // ndeps_noalias, kmp_depend_info_t *noalias_dep_list); if dependence info */
+ /* // is specified. */
+ /* if (!Data.Dependences.empty()) */
+ /* CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( */
+ /* M, OMPRTL___kmpc_omp_taskwait_deps_51), */
+ /* DepWaitTaskArgs); */ + /* // Call proxy_task_entry(gtid, new_task); */
+ /* auto &&CodeGen = [TaskEntry, ThreadID, NewTaskNewTaskTTy, */
+ /* Loc](CodeGenFunction &CGF, PrePostActionTy &Action) { */
+ /* Action.Enter(CGF); */
+ /* llvm::Value *OutlinedFnArgs[] = {ThreadID, NewTaskNewTaskTTy}; */
+ /* CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc, TaskEntry, */
+ /* OutlinedFnArgs); */
+ /* }; */
+
+ /* // Build void __kmpc_omp_task_begin_if0(ident_t *, kmp_int32 gtid, */
+ /* // kmp_task_t *new_task); */
+ /* // Build void __kmpc_omp_task_complete_if0(ident_t *, kmp_int32 gtid, */
+ /* // kmp_task_t *new_task); */
+ /* RegionCodeGenTy RCG(CodeGen); */
+ /* CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction( */
+ /* M, OMPRTL___kmpc_omp_task_begin_if0), */
+ /* TaskArgs, */
+ /* OMPBuilder.getOrCreateRuntimeFunction( */
+ /* M, OMPRTL___kmpc_omp_task_complete_if0), */
+ /* TaskArgs); */
+ /* RCG.setAction(Action); */
+ /* RCG(CGF); */ + /* }; */
+
+if (IfCond) {
+ emitIfClause(CGF, IfCond, ThenGen, ElseGen);
+ } else {
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph),
+ Args);
+ }
}
void CGOpenMPRuntime::emitInlinedDirective(CodeGenFunction &CGF,
@@ -13208,6 +13301,16 @@ void CGOpenMPSIMDRuntime::emitTaskyieldCall(CodeGenFunction &CGF,
llvm_unreachable("Not supported in SIMD-only mode");
}
+<<<<<<< HEAD
+=======
+void CGOpenMPSIMDRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
+ SourceLocation Loc,
+ const OMPExecutableDirective &D,
+ const Expr *IfCond) {
+ llvm_unreachable("Not supported in SIMD-only mode");
+}
+
+>>>>>>> ae29236fb180 ([xxx][wip][clang] Add If clause in taskgraph)
void CGOpenMPSIMDRuntime::emitTaskgroupRegion(
CodeGenFunction &CGF, const RegionCodeGenTy &TaskgroupOpGen,
SourceLocation Loc) {
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index 2753f0e7f2dfc..b74823dd6b7c1 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -1382,7 +1382,7 @@ class CGOpenMPRuntime {
/// Emit code for 'taskgraph' directive.
virtual void emitTaskgraphCall(CodeGenFunction &CGF, SourceLocation Loc,
- const OMPExecutableDirective &D);
+ const OMPExecutableDirective &D, const Expr *IfCond);
/// Emit code for 'cancellation point' construct.
/// \param CancelRegion Region kind for which the cancellation point must be
@@ -2213,8 +2213,12 @@ class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {
const OMPTaskDataTy &Data) override;
/// Emit code for 'taskgraph' directive.
+ /// \param IfCond Expression evaluated in if clause associated with the target
+ /// \param D Directive to emit.
void emitTaskgraphCall(CodeGenFunction &CGF, SourceLocation Loc,
- const OMPExecutableDirective &D) override;
+ const OMPExecutableDirective &D,
+ const Expr *IfCond
+ ) override;
/// Emit code for 'cancellation point' construct.
/// \param CancelRegion Region kind for which the cancellation point must be
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 59e27c8a14e55..724f093279ac6 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -5637,7 +5637,16 @@ void CodeGenFunction::EmitOMPTaskwaitDirective(const OMPTaskwaitDirective &S) {
void CodeGenFunction::EmitOMPTaskgraphDirective(
const OMPTaskgraphDirective &S) {
- CGM.getOpenMPRuntime().emitTaskgraphCall(*this, S.getBeginLoc(), S);
+ const Expr *IfCond = nullptr;
+ for (const auto *C : S.getClausesOfKind<OMPIfClause>()) {
+ if (C->getNameModifier() == OMPD_unknown ||
+ C->getNameModifier() == OMPD_cancel) {
+ IfCond = C->getCondition();
+ break;
+ }
+ }
+
+ CGM.getOpenMPRuntime().emitTaskgraphCall(*this, S.getBeginLoc(), S, IfCond);
}
static bool isSupportedByOpenMPIRBuilder(const OMPTaskgroupDirective &T) {
>From 0429e9d00fdcb7a45493afd95e92acf9e732b298 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 15 Sep 2025 17:53:15 +0200
Subject: [PATCH 20/28] [xxx][clang] Clean emitTaskgraphCall when If clause
---
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 83 ++-------------------------
1 file changed, 5 insertions(+), 78 deletions(-)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 79bbca43b72ee..fbb5a6473bcd9 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6194,86 +6194,22 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
CGM.getModule(), OMPRTL___kmpc_taskgraph),
Args);
};
- llvm::Module &M = CGM.getModule();
- llvm::Value *ThreadID = getThreadID(CGF, Loc);
- auto &&ElseGen = [&M, ThreadID, &CGF, this, &FnT, &CapStruct, &Loc, &OutlinedCGF]
+ auto &&ElseGen = [&CGF, this, &FnT, &CapStruct, &Loc, &OutlinedCGF]
(CodeGenFunction &, PrePostActionTy &) {
- // This logic is adapted from the if(false) path of a regular task.
- // It ensures the taskgraph body is executed with minimal runtime overhead.
-
- // Arguments for the begin/complete calls: ident_t*, gtid, kmp_task_t*
- // We can pass the captured struct pointer as the "task" handle for the
- // lightweight if0 calls.
- /* llvm::Value *CapturedArgsPtr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast( */
- /* CapStruct.getPointer(CGF), CGM.KmpTaskTDefaultTy->getPointer()); */
llvm::Value *CapturedArgsPtr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy);
- std::vector<llvm::Value *> TaskArgs{
- emitUpdateLocation(CGF, Loc),
- ThreadID,
- CapturedArgsPtr
- };
-
- // This is the core logic that will execute the outlined function.
auto &&CodeGen = [&](CodeGenFunction &CGF, PrePostActionTy &Action) {
Action.Enter(CGF);
- /* llvm::Value *OutlinedFnArgs[] = {ThreadID, CapturedArgsPtr }; */
CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc,
- FnT, // Our outlined taskgraph body
+ FnT,
CapturedArgsPtr);
};
-
- // We use a RegionCodeGenTy to wrap the core logic with "begin" and "complete"
- // runtime calls. This is a standard pattern in Clang's OpenMP implementation.
RegionCodeGenTy RCG(CodeGen);
-
- // Set up the pre/post actions using the special 'if0' runtime functions.
- // These are optimized for the if(false) case.
- /* CommonActionTy Action( */
- /* OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_omp_task_begin_if0), */
- /* TaskArgs, */
- /* OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_omp_task_complete_if0), */
- /* TaskArgs); */
- /* RCG.setAction(Action); */
RCG(CGF);
};
- /* auto &&ElseCodeGen = [this, &M, &TaskArgs, ThreadID, */
- /* TaskEntry, &Data, &DepWaitTaskArgs, */
- /* Loc](CodeGenFunction &CGF, PrePostActionTy &) { */
- /* CodeGenFunction::RunCleanupsScope LocalScope(CGF); */
- /* // Build void __kmpc_omp_wait_deps(ident_t *, kmp_int32 gtid, */
- /* // kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32 */
- /* // ndeps_noalias, kmp_depend_info_t *noalias_dep_list); if dependence info */
- /* // is specified. */
- /* if (!Data.Dependences.empty()) */
- /* CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction( */
- /* M, OMPRTL___kmpc_omp_taskwait_deps_51), */
- /* DepWaitTaskArgs); */ + /* // Call proxy_task_entry(gtid, new_task); */
- /* auto &&CodeGen = [TaskEntry, ThreadID, NewTaskNewTaskTTy, */
- /* Loc](CodeGenFunction &CGF, PrePostActionTy &Action) { */
- /* Action.Enter(CGF); */
- /* llvm::Value *OutlinedFnArgs[] = {ThreadID, NewTaskNewTaskTTy}; */
- /* CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc, TaskEntry, */
- /* OutlinedFnArgs); */
- /* }; */
-
- /* // Build void __kmpc_omp_task_begin_if0(ident_t *, kmp_int32 gtid, */
- /* // kmp_task_t *new_task); */
- /* // Build void __kmpc_omp_task_complete_if0(ident_t *, kmp_int32 gtid, */
- /* // kmp_task_t *new_task); */
- /* RegionCodeGenTy RCG(CodeGen); */
- /* CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction( */
- /* M, OMPRTL___kmpc_omp_task_begin_if0), */
- /* TaskArgs, */
- /* OMPBuilder.getOrCreateRuntimeFunction( */
- /* M, OMPRTL___kmpc_omp_task_complete_if0), */
- /* TaskArgs); */
- /* RCG.setAction(Action); */
- /* RCG(CGF); */ + /* }; */
-
-if (IfCond) {
+ if (IfCond) {
emitIfClause(CGF, IfCond, ThenGen, ElseGen);
} else {
CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
@@ -13301,16 +13237,6 @@ void CGOpenMPSIMDRuntime::emitTaskyieldCall(CodeGenFunction &CGF,
llvm_unreachable("Not supported in SIMD-only mode");
}
-<<<<<<< HEAD
-=======
-void CGOpenMPSIMDRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
- SourceLocation Loc,
- const OMPExecutableDirective &D,
- const Expr *IfCond) {
- llvm_unreachable("Not supported in SIMD-only mode");
-}
-
->>>>>>> ae29236fb180 ([xxx][wip][clang] Add If clause in taskgraph)
void CGOpenMPSIMDRuntime::emitTaskgroupRegion(
CodeGenFunction &CGF, const RegionCodeGenTy &TaskgroupOpGen,
SourceLocation Loc) {
@@ -13483,7 +13409,8 @@ void CGOpenMPSIMDRuntime::emitTaskwaitCall(CodeGenFunction &CGF,
void CGOpenMPSIMDRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
SourceLocation Loc,
- const OMPExecutableDirective &D) {
+ const OMPExecutableDirective &D,
+ const Expr *IfCond) {
llvm_unreachable("Not supported in SIMD-only mode");
}
>From 69a6e437b5fdfd80108e9bc614d0be2ae93005d3 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 15 Sep 2025 19:02:28 +0200
Subject: [PATCH 21/28] [xxx][openmp] Add use of re_record flag in taskgraph
---
openmp/runtime/src/kmp_tasking.cpp | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index d2566786ee900..a1a01db3da1e9 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -5439,8 +5439,9 @@ kmp_int32 __kmpc_start_record_task(ident_t *loc_ref, kmp_int32 gtid,
}
__kmpc_taskgroup(loc_ref, gtid);
- if (kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id)) {
- // TODO: use re_record flag
+ kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id);
+ if (!flags->re_record && tdg) {
+ // TODO: remove old if re_record
__kmp_exec_tdg(gtid, tdg);
res = 0;
} else {
>From 9f238c654f782f19857911b95b6dd01883a8ec89 Mon Sep 17 00:00:00 2001
From: jpinot <josep.pinot at bsc.es>
Date: Mon, 15 Sep 2025 19:54:38 +0200
Subject: [PATCH 22/28] [xxx][clang][openmp] Add support for nowait/nogroup in
taskgraph
---
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 2 +-
openmp/runtime/src/kmp_tasking.cpp | 10 ++++++----
2 files changed, 7 insertions(+), 5 deletions(-)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index fbb5a6473bcd9..9fc35f4c2fa05 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -6137,7 +6137,7 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
unsigned Flags = 0;
- if (D.getSingleClause<OMPNowaitClause>()) {
+ if (D.getSingleClause<OMPNogroupClause>()) {
Flags |= NowaitFlag;
}
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index a1a01db3da1e9..fd69cda800d12 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -5438,7 +5438,8 @@ kmp_int32 __kmpc_start_record_task(ident_t *loc_ref, kmp_int32 gtid,
return 1;
}
- __kmpc_taskgroup(loc_ref, gtid);
+ if (!flags->nowait)
+ __kmpc_taskgroup(loc_ref, gtid);
kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id);
if (!flags->re_record && tdg) {
// TODO: remove old if re_record
@@ -5516,13 +5517,14 @@ void __kmp_end_record(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
void __kmpc_end_record_task(ident_t *loc_ref, kmp_int32 gtid,
kmp_int32 input_flags, kmp_int32 tdg_id) {
kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id);
+ kmp_taskgraph_flags_t *flags = (kmp_taskgraph_flags_t *)&input_flags;
KA_TRACE(10, ("__kmpc_end_record_task(enter): T#%d loc=%p finishes recording"
" tdg=%d with flags=%d\n",
gtid, loc_ref, tdg_id, input_flags));
- if (__kmp_max_tdgs) {
- // TODO: use input_flags->nowait
- __kmpc_end_taskgroup(loc_ref, gtid);
+ if (__kmp_max_tdgs && tdg) {
+ if (!flags->nowait)
+ __kmpc_end_taskgroup(loc_ref, gtid);
if (__kmp_tdg_is_recording(tdg->tdg_status))
__kmp_end_record(gtid, tdg);
}
>From 81acabbe70ba037eb5c8a75d0743d2c1ae2a5a7c Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Tue, 30 Sep 2025 10:13:36 -0500
Subject: [PATCH 23/28] move/fixup some stuff
---
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 246 +++++++++---------
.../include/llvm/Frontend/OpenMP/OMPKinds.def | 1 -
2 files changed, 123 insertions(+), 124 deletions(-)
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index 9fc35f4c2fa05..e8a79df3dd5f4 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -236,26 +236,6 @@ class CGOpenMPTaskOutlinedRegionInfo final : public CGOpenMPRegionInfo {
const UntiedTaskActionTy &Action;
};
-/// API for captured statement code generation in OpenMP taskgraphs.
-class CGOpenMPTaskgraphRegionInfo final : public CGOpenMPRegionInfo {
-public:
- CGOpenMPTaskgraphRegionInfo(const CapturedStmt &CS,
- const RegionCodeGenTy &CodeGen)
- : CGOpenMPRegionInfo(CS, TaskgraphOutlinedRegion, CodeGen,
- llvm::omp::OMPD_taskgraph, false) {}
-
- const VarDecl *getThreadIDVariable() const override { return 0; }
-
- /// Get the name of the capture helper.
- StringRef getHelperName() const override { return "taskgraph.omp_outlined."; }
-
- static bool classof(const CGCapturedStmtInfo *Info) {
- return CGOpenMPRegionInfo::classof(Info) &&
- cast<CGOpenMPRegionInfo>(Info)->getRegionKind() ==
- TaskgraphOutlinedRegion;
- }
-};
-
/// API for inlined captured statement code generation in OpenMP
/// constructs.
class CGOpenMPInlinedRegionInfo : public CGOpenMPRegionInfo {
@@ -368,6 +348,26 @@ class CGOpenMPTargetRegionInfo final : public CGOpenMPRegionInfo {
StringRef HelperName;
};
+/// API for captured statement code generation in OpenMP taskgraphs.
+class CGOpenMPTaskgraphRegionInfo final : public CGOpenMPRegionInfo {
+public:
+ CGOpenMPTaskgraphRegionInfo(const CapturedStmt &CS,
+ const RegionCodeGenTy &CodeGen)
+ : CGOpenMPRegionInfo(CS, TaskgraphOutlinedRegion, CodeGen,
+ llvm::omp::OMPD_taskgraph, false) {}
+
+ const VarDecl *getThreadIDVariable() const override { return 0; }
+
+ /// Get the name of the capture helper.
+ StringRef getHelperName() const override { return "taskgraph.omp_outlined."; }
+
+ static bool classof(const CGCapturedStmtInfo *Info) {
+ return CGOpenMPRegionInfo::classof(Info) &&
+ cast<CGOpenMPRegionInfo>(Info)->getRegionKind() ==
+ TaskgraphOutlinedRegion;
+ }
+};
+
static void EmptyCodeGen(CodeGenFunction &, PrePostActionTy &) {
llvm_unreachable("No codegen for expressions");
}
@@ -2242,6 +2242,102 @@ void CGOpenMPRuntime::emitTaskyieldCall(CodeGenFunction &CGF,
Region->emitUntiedSwitch(CGF);
}
+void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
+ SourceLocation Loc,
+ const OMPExecutableDirective &D,
+ const Expr *IfCond) {
+ if (!CGF.HaveInsertPoint())
+ return;
+
+ // Building kmp_taskgraph_flags_t flags for kmpc_taskgraph. C.f., kmp.h
+ enum {
+ NowaitFlag = 0x1, // Not used yet.
+ ReRecordFlag = 0x2,
+ };
+
+ unsigned Flags = 0;
+
+ if (D.getSingleClause<OMPNogroupClause>()) {
+ Flags |= NowaitFlag;
+ }
+
+ const OMPGraphResetClause *GraphResetClause =
+ D.getSingleClause<OMPGraphResetClause>();
+ if (GraphResetClause) {
+ const Expr *Cond = GraphResetClause->getCondition();
+ llvm::Value *CondVal = CGF.EvaluateExprAsBool(Cond);
+ if (CondVal) {
+ llvm::Value *CondBool = CGF.Builder.CreateICmpNE(
+ CondVal, llvm::ConstantInt::get(CondVal->getType(), 0));
+ if (llvm::ConstantInt *CI = llvm::dyn_cast<llvm::ConstantInt>(CondBool)) {
+ if (CI->isOne()) {
+ Flags |= ReRecordFlag;
+ }
+ }
+ }
+ }
+
+ llvm::Value *GraphId = CGF.Builder.getInt32(0);
+ const OMPGraphIdClause *GraphIdClause = D.getSingleClause<OMPGraphIdClause>();
+ if (GraphIdClause) {
+ const auto *E = GraphIdClause->getCondition();
+ auto *GraphIdVal = CGF.EmitScalarExpr(E);
+ GraphId = CGF.Builder.CreateIntCast(GraphIdVal, CGM.Int32Ty, true);
+ }
+
+ CodeGenFunction OutlinedCGF(CGM, /*suppressNewContext=*/true);
+
+ const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
+
+ auto BodyGen = [CS](CodeGenFunction &CGF, PrePostActionTy &) {
+ CGF.EmitStmt(CS->getCapturedStmt());
+ };
+
+ LValue CapStruct = CGF.InitCapturedStruct(*CS);
+ CGOpenMPTaskgraphRegionInfo TaskgraphRegion(*CS, BodyGen);
+ CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(OutlinedCGF,
+ &TaskgraphRegion);
+ llvm::Function *FnT = OutlinedCGF.GenerateCapturedStmtFunction(*CS);
+
+ std::array<llvm::Value *, 7> Args{
+ emitUpdateLocation(CGF, Loc),
+ getThreadID(CGF, Loc),
+ CGF.Builder.getInt32(Flags),
+ CGF.Builder.getInt32(D.getBeginLoc().getHashValue()),
+ GraphId,
+ CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(FnT, CGM.VoidPtrTy),
+ CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+ CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy)};
+
+ auto &&ThenGen = [&CGF, this, &Args](CodeGenFunction &, PrePostActionTy &) {
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph),
+ Args);
+ };
+ auto &&ElseGen = [&CGF, this, &FnT, &CapStruct, &Loc, &OutlinedCGF]
+ (CodeGenFunction &, PrePostActionTy &) {
+ llvm::Value *CapturedArgsPtr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+ CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy);
+
+ auto &&CodeGen = [&](CodeGenFunction &CGF, PrePostActionTy &Action) {
+ Action.Enter(CGF);
+ CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc,
+ FnT,
+ CapturedArgsPtr);
+ };
+ RegionCodeGenTy RCG(CodeGen);
+ RCG(CGF);
+ };
+
+ if (IfCond) {
+ emitIfClause(CGF, IfCond, ThenGen, ElseGen);
+ } else {
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph),
+ Args);
+ }
+}
+
void CGOpenMPRuntime::emitTaskgroupRegion(CodeGenFunction &CGF,
const RegionCodeGenTy &TaskgroupOpGen,
SourceLocation Loc) {
@@ -6122,102 +6218,6 @@ void CGOpenMPRuntime::emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
Region->emitUntiedSwitch(CGF);
}
-void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
- SourceLocation Loc,
- const OMPExecutableDirective &D,
- const Expr *IfCond) {
- if (!CGF.HaveInsertPoint())
- return;
-
- // Building kmp_taskgraph_flags_t flags for kmpc_taskgraph. C.f., kmp.h
- enum {
- NowaitFlag = 0x1, // Not used yet.
- ReRecordFlag = 0x2,
- };
-
- unsigned Flags = 0;
-
- if (D.getSingleClause<OMPNogroupClause>()) {
- Flags |= NowaitFlag;
- }
-
- const OMPGraphResetClause *GraphResetClause =
- D.getSingleClause<OMPGraphResetClause>();
- if (GraphResetClause) {
- const Expr *Cond = GraphResetClause->getCondition();
- llvm::Value *CondVal = CGF.EvaluateExprAsBool(Cond);
- if (CondVal) {
- llvm::Value *CondBool = CGF.Builder.CreateICmpNE(
- CondVal, llvm::ConstantInt::get(CondVal->getType(), 0));
- if (llvm::ConstantInt *CI = llvm::dyn_cast<llvm::ConstantInt>(CondBool)) {
- if (CI->isOne()) {
- Flags |= ReRecordFlag;
- }
- }
- }
- }
-
- llvm::Value *GraphId = CGF.Builder.getInt32(0);
- const OMPGraphIdClause *GraphIdClause = D.getSingleClause<OMPGraphIdClause>();
- if (GraphIdClause) {
- const auto *E = GraphIdClause->getCondition();
- auto *GraphIdVal = CGF.EmitScalarExpr(E);
- GraphId = CGF.Builder.CreateIntCast(GraphIdVal, CGM.Int32Ty, true);
- }
-
- CodeGenFunction OutlinedCGF(CGM, /*suppressNewContext=*/true);
-
- const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
-
- auto BodyGen = [CS](CodeGenFunction &CGF, PrePostActionTy &) {
- CGF.EmitStmt(CS->getCapturedStmt());
- };
-
- LValue CapStruct = CGF.InitCapturedStruct(*CS);
- CGOpenMPTaskgraphRegionInfo TaskgraphRegion(*CS, BodyGen);
- CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(OutlinedCGF,
- &TaskgraphRegion);
- llvm::Function *FnT = OutlinedCGF.GenerateCapturedStmtFunction(*CS);
-
- std::array<llvm::Value *, 7> Args{
- emitUpdateLocation(CGF, Loc),
- getThreadID(CGF, Loc),
- CGF.Builder.getInt32(Flags),
- CGF.Builder.getInt32(D.getBeginLoc().getHashValue()),
- GraphId,
- CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(FnT, CGM.VoidPtrTy),
- CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
- CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy)};
-
- auto &&ThenGen = [&CGF, this, &Args](CodeGenFunction &, PrePostActionTy &) {
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_taskgraph),
- Args);
- };
- auto &&ElseGen = [&CGF, this, &FnT, &CapStruct, &Loc, &OutlinedCGF]
- (CodeGenFunction &, PrePostActionTy &) {
- llvm::Value *CapturedArgsPtr = CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
- CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy);
-
- auto &&CodeGen = [&](CodeGenFunction &CGF, PrePostActionTy &Action) {
- Action.Enter(CGF);
- CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc,
- FnT,
- CapturedArgsPtr);
- };
- RegionCodeGenTy RCG(CodeGen);
- RCG(CGF);
- };
-
- if (IfCond) {
- emitIfClause(CGF, IfCond, ThenGen, ElseGen);
- } else {
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_taskgraph),
- Args);
- }
-}
-
void CGOpenMPRuntime::emitInlinedDirective(CodeGenFunction &CGF,
OpenMPDirectiveKind InnerKind,
const RegionCodeGenTy &CodeGen,
@@ -13237,6 +13237,13 @@ void CGOpenMPSIMDRuntime::emitTaskyieldCall(CodeGenFunction &CGF,
llvm_unreachable("Not supported in SIMD-only mode");
}
+void CGOpenMPSIMDRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
+ SourceLocation Loc,
+ const OMPExecutableDirective &D,
+ const Expr *IfCond) {
+ llvm_unreachable("Not supported in SIMD-only mode");
+}
+
void CGOpenMPSIMDRuntime::emitTaskgroupRegion(
CodeGenFunction &CGF, const RegionCodeGenTy &TaskgroupOpGen,
SourceLocation Loc) {
@@ -13407,13 +13414,6 @@ void CGOpenMPSIMDRuntime::emitTaskwaitCall(CodeGenFunction &CGF,
llvm_unreachable("Not supported in SIMD-only mode");
}
-void CGOpenMPSIMDRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
- SourceLocation Loc,
- const OMPExecutableDirective &D,
- const Expr *IfCond) {
- llvm_unreachable("Not supported in SIMD-only mode");
-}
-
void CGOpenMPSIMDRuntime::emitCancellationPointCall(
CodeGenFunction &CGF, SourceLocation Loc,
OpenMPDirectiveKind CancelRegion) {
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 0883d06f1da77..288585c8b42a6 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -355,7 +355,6 @@ __OMP_RTL(__kmpc_omp_task_alloc, false, /* kmp_task_t */ VoidPtr, IdentPtr,
Int32, Int32, SizeTy, SizeTy, TaskRoutineEntryPtr)
__OMP_RTL(__kmpc_omp_task, false, Int32, IdentPtr, Int32,
/* kmp_task_t */ VoidPtr)
-__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, Int32, Int32, VoidPtr, VoidPtr)
__OMP_RTL(__kmpc_end_taskgroup, false, Void, IdentPtr, Int32)
__OMP_RTL(__kmpc_taskgroup, false, Void, IdentPtr, Int32)
__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, Int32, Int32, Int32, VoidPtr, VoidPtr)
>From 84d96bb69c371512a012ca25f72da919a4706837 Mon Sep 17 00:00:00 2001
From: Josep Pinot <josep.pinot at bsc.es>
Date: Fri, 14 Mar 2025 08:02:23 +0100
Subject: [PATCH 24/28] [OpenMP] Update OpenMP runtime to adopt taskgraph
clause from 6.0 Specs (#130751)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Updating OpenMP runtime taskgraph support(record/replay mechanism):
- Adds a `graph_reset` bit in `kmp_taskgraph_flags_t` to discard
existing TDG records.
- Switches from a strict index-based TDG ID/IDX to a more flexible
integer-based, which can be any integer (e.g. hashed).
- Adds helper functions like `__kmp_find_tdg`, `__kmp_alloc_tdg`, and
`__kmp_free_tdg` to manage TDGs by their IDs.
These changes pave the way for the integration of OpenMP taskgraph (spec
6.0). Taskgraphs are still recorded in an array with a lookup efficiency
reduced to O(n), where n ≤ `__kmp_max_tdgs`. This can be optimized by
moving the TDGs to a hashtable, making lookups more efficient. The
provided helper routines facilitate easier future optimizations.
---
openmp/runtime/src/kmp.h | 6 +-
openmp/runtime/src/kmp_global.cpp | 3 +-
openmp/runtime/src/kmp_tasking.cpp | 127 ++++++++++++------
.../tasking/omp_record_replay_random_id.cpp | 47 +++++++
.../test/tasking/omp_record_replay_reset.cpp | 47 +++++++
5 files changed, 184 insertions(+), 46 deletions(-)
create mode 100644 openmp/runtime/test/tasking/omp_record_replay_random_id.cpp
create mode 100644 openmp/runtime/test/tasking/omp_record_replay_reset.cpp
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index b2b0e39a39c73..73ad70444ec22 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2638,7 +2638,9 @@ typedef struct {
typedef struct kmp_taskgraph_flags { /*This needs to be exactly 32 bits */
unsigned nowait : 1;
unsigned re_record : 1;
- unsigned reserved : 30;
+ unsigned graph_reset : 1; /* 1==discard taskgraph record, 0==use taskgraph
+ record */
+ unsigned reserved : 29;
} kmp_taskgraph_flags_t;
/// Represents a TDG node
@@ -2683,7 +2685,7 @@ typedef struct kmp_tdg_info {
extern int __kmp_tdg_dot;
extern kmp_int32 __kmp_max_tdgs;
extern kmp_tdg_info_t **__kmp_global_tdgs;
-extern kmp_int32 __kmp_curr_tdg_id;
+extern kmp_tdg_info_t *__kmp_curr_tdg;
extern kmp_int32 __kmp_successors_size;
extern std::atomic<kmp_int32> __kmp_tdg_task_id;
extern kmp_int32 __kmp_num_tdg;
diff --git a/openmp/runtime/src/kmp_global.cpp b/openmp/runtime/src/kmp_global.cpp
index c5c9a32fd0812..bd089d6f0bc3f 100644
--- a/openmp/runtime/src/kmp_global.cpp
+++ b/openmp/runtime/src/kmp_global.cpp
@@ -558,8 +558,7 @@ int *__kmp_nesting_nth_level;
int __kmp_tdg_dot = 0;
kmp_int32 __kmp_max_tdgs = 100;
kmp_tdg_info_t **__kmp_global_tdgs = NULL;
-kmp_int32 __kmp_curr_tdg_id =
- 0; // Id of the current TDG being recorded or executed
+kmp_tdg_info_t *__kmp_curr_tdg = NULL; // Current TDG being recorded or executed
kmp_int32 __kmp_num_tdg = 0;
kmp_int32 __kmp_successors_size = 10; // Initial succesor size list for
// recording
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index fd69cda800d12..71d78413f356a 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1431,7 +1431,7 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
}
#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
+ kmp_tdg_info_t *tdg = __kmp_curr_tdg;
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status) &&
(task_entry != (kmp_routine_entry_t)__kmp_taskloop_task)) {
taskdata->is_taskgraph = 1;
@@ -2374,14 +2374,11 @@ without help of the runtime library.
*/
void *__kmpc_task_reduction_init(int gtid, int num, void *data) {
#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
+ kmp_tdg_info_t *tdg = __kmp_curr_tdg;
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
- kmp_tdg_info_t *this_tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
- this_tdg->rec_taskred_data =
- __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
- this_tdg->rec_num_taskred = num;
- KMP_MEMCPY(this_tdg->rec_taskred_data, data,
- sizeof(kmp_task_red_input_t) * num);
+ tdg->rec_taskred_data = __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
+ tdg->rec_num_taskred = num;
+ KMP_MEMCPY(tdg->rec_taskred_data, data, sizeof(kmp_task_red_input_t) * num);
}
#endif
return __kmp_task_reduction_init(gtid, num, (kmp_task_red_input_t *)data);
@@ -2401,7 +2398,7 @@ has two parameters, pointer to object to be initialized and pointer to omp_orig
*/
void *__kmpc_taskred_init(int gtid, int num, void *data) {
#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_info_t *tdg = __kmp_find_tdg(__kmp_curr_tdg_id);
+ kmp_tdg_info_t *tdg = __kmp_curr_tdg;
if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
tdg->rec_taskred_data = __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
tdg->rec_num_taskred = num;
@@ -2456,8 +2453,7 @@ void *__kmpc_task_reduction_get_th_data(int gtid, void *tskgrp, void *data) {
#if OMP_TASKGRAPH_EXPERIMENTAL
if ((thread->th.th_current_task->is_taskgraph) &&
- (!__kmp_tdg_is_recording(
- __kmp_find_tdg(__kmp_curr_tdg_id)->tdg_status))) {
+ (!__kmp_tdg_is_recording(__kmp_curr_tdg->tdg_status))) {
tg = thread->th.th_current_task->td_taskgroup;
KMP_ASSERT(tg != NULL);
KMP_ASSERT(tg->reduce_data != NULL);
@@ -5268,17 +5264,71 @@ static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id) {
__kmp_global_tdgs = (kmp_tdg_info_t **)__kmp_allocate(
sizeof(kmp_tdg_info_t *) * __kmp_max_tdgs);
- for (kmp_int32 i = 0; i < __kmp_num_tdg; ++i) {
- if ((__kmp_global_tdgs[i]) && (__kmp_global_tdgs[i]->tdg_id == tdg_id) &&
- (__kmp_global_tdgs[i]->tdg_status != KMP_TDG_NONE)) {
- res = __kmp_global_tdgs[i];
- __kmp_curr_tdg_id = tdg_id;
+ for (kmp_int32 tdg_idx = 0; tdg_idx < __kmp_max_tdgs; tdg_idx++) {
+ if (__kmp_global_tdgs[tdg_idx] &&
+ __kmp_global_tdgs[tdg_idx]->tdg_id == tdg_id) {
+ if (__kmp_global_tdgs[tdg_idx]->tdg_status != KMP_TDG_NONE)
+ res = __kmp_global_tdgs[tdg_idx];
break;
}
}
return res;
}
+// __kmp_alloc_tdg: Allocates a TDG if it doesn't already exist.
+// tdg_id: ID of the TDG.
+// returns: A pointer to the TDG if it already exists. Otherwise,
+// allocates a new TDG if the maximum limit has not been reached.
+// Returns nullptr if no TDG can be allocated.
+static kmp_tdg_info_t *__kmp_alloc_tdg(kmp_int32 tdg_id) {
+ kmp_tdg_info_t *res = nullptr;
+ if ((res = __kmp_find_tdg(tdg_id)))
+ return res;
+
+ if (__kmp_num_tdg > __kmp_max_tdgs)
+ return res;
+
+ for (kmp_int32 tdg_idx = 0; tdg_idx < __kmp_max_tdgs; tdg_idx++) {
+ if (!__kmp_global_tdgs[tdg_idx]) {
+ kmp_tdg_info_t *tdg =
+ (kmp_tdg_info_t *)__kmp_allocate(sizeof(kmp_tdg_info_t));
+ __kmp_global_tdgs[tdg_idx] = tdg;
+ __kmp_curr_tdg = tdg;
+ res = __kmp_global_tdgs[tdg_idx];
+ break;
+ }
+ }
+ return res;
+}
+
+// __kmp_free_tdg: Frees a TDG if it exists.
+// tdg_id: ID of the TDG to be freed.
+// returns: true if a TDG with the given ID was found and successfully freed,
+// false if no such TDG exists.
+static bool __kmp_free_tdg(kmp_int32 tdg_id) {
+ kmp_tdg_info_t *tdg = nullptr;
+ if (__kmp_global_tdgs == NULL)
+ return false;
+
+ for (kmp_int32 tdg_idx = 0; tdg_idx < __kmp_max_tdgs; tdg_idx++) {
+ if (__kmp_global_tdgs[tdg_idx] &&
+ __kmp_global_tdgs[tdg_idx]->tdg_id == tdg_id) {
+ tdg = __kmp_global_tdgs[tdg_idx];
+ for (kmp_int map_idx = 0; map_idx < tdg->map_size; map_idx++) {
+ __kmp_free(tdg->record_map[map_idx].successors);
+ }
+ __kmp_free(tdg->record_map);
+ if (tdg->root_tasks)
+ __kmp_free(tdg->root_tasks);
+
+ __kmp_free(tdg);
+ __kmp_global_tdgs[tdg_idx] = NULL;
+ return true;
+ }
+ }
+ return false;
+}
+
// __kmp_print_tdg_dot: prints the TDG to a dot file
// tdg: ID of the TDG
// gtid: Global Thread ID
@@ -5384,9 +5434,7 @@ void __kmp_exec_tdg(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
static inline void __kmp_start_record(kmp_int32 gtid,
kmp_taskgraph_flags_t *flags,
kmp_int32 tdg_id) {
- kmp_tdg_info_t *tdg =
- (kmp_tdg_info_t *)__kmp_allocate(sizeof(kmp_tdg_info_t));
- __kmp_global_tdgs[__kmp_num_tdg - 1] = tdg;
+ kmp_tdg_info_t *tdg = __kmp_alloc_tdg(tdg_id);
// Initializing the TDG structure
tdg->tdg_id = tdg_id;
tdg->map_size = INIT_MAPSIZE;
@@ -5418,42 +5466,36 @@ static inline void __kmp_start_record(kmp_int32 gtid,
// loc_ref: Location of TDG, not used yet
// gtid: Global Thread ID of the encountering thread
// input_flags: Flags associated with the TDG
-// tdg_id: ID of the TDG to record, for now, incremental integer
+// tdg_id: ID of the TDG to record
// returns: 1 if we record, otherwise, 0
kmp_int32 __kmpc_start_record_task(ident_t *loc_ref, kmp_int32 gtid,
kmp_int32 input_flags, kmp_int32 tdg_id) {
-
kmp_int32 res;
kmp_taskgraph_flags_t *flags = (kmp_taskgraph_flags_t *)&input_flags;
- KA_TRACE(10,
- ("__kmpc_start_record_task(enter): T#%d loc=%p flags=%d tdg_id=%d\n",
- gtid, loc_ref, input_flags, tdg_id));
+ KA_TRACE(10, ("__kmpc_start_record_task(enter): T#%d loc=%p flags=%d "
+ "tdg_id=%d\n",
+ gtid, loc_ref, input_flags, tdg_id));
if (__kmp_max_tdgs == 0) {
- KA_TRACE(
- 10,
- ("__kmpc_start_record_task(abandon): T#%d loc=%p flags=%d tdg_id = %d, "
- "__kmp_max_tdgs = 0\n",
- gtid, loc_ref, input_flags, tdg_id));
+ KA_TRACE(10, ("__kmpc_start_record_task(abandon): T#%d loc=%p flags=%d "
+ "tdg_id = %d, __kmp_max_tdgs = 0\n",
+ gtid, loc_ref, input_flags, tdg_id));
return 1;
}
- if (!flags->nowait)
- __kmpc_taskgroup(loc_ref, gtid);
- kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id);
- if (!flags->re_record && tdg) {
- // TODO: remove old if re_record
+ __kmpc_taskgroup(loc_ref, gtid);
+ if (flags->graph_reset) {
+ __kmp_free_tdg(tdg_id);
+ __kmp_num_tdg--;
+ }
+ if (kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id)) {
+ // TODO: use re_record flag
__kmp_exec_tdg(gtid, tdg);
res = 0;
} else {
- if (__kmp_num_tdg < __kmp_max_tdgs) {
- __kmp_curr_tdg_id = tdg_id;
- __kmp_num_tdg++;
- KMP_DEBUG_ASSERT(__kmp_num_tdg <= __kmp_max_tdgs);
- __kmp_start_record(gtid, flags, tdg_id);
- }
- // if no TDG found, need to execute the task
- // even not recording
+ KMP_DEBUG_ASSERT(__kmp_num_tdg < __kmp_max_tdgs);
+ __kmp_start_record(gtid, flags, tdg_id);
+ __kmp_num_tdg++;
res = 1;
}
KA_TRACE(10, ("__kmpc_start_record_task(exit): T#%d TDG %d starts to %s\n",
@@ -5519,6 +5561,7 @@ void __kmpc_end_record_task(ident_t *loc_ref, kmp_int32 gtid,
kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id);
kmp_taskgraph_flags_t *flags = (kmp_taskgraph_flags_t *)&input_flags;
+ KMP_DEBUG_ASSERT(tdg != NULL);
KA_TRACE(10, ("__kmpc_end_record_task(enter): T#%d loc=%p finishes recording"
" tdg=%d with flags=%d\n",
gtid, loc_ref, tdg_id, input_flags));
diff --git a/openmp/runtime/test/tasking/omp_record_replay_random_id.cpp b/openmp/runtime/test/tasking/omp_record_replay_random_id.cpp
new file mode 100644
index 0000000000000..58e90da4d782a
--- /dev/null
+++ b/openmp/runtime/test/tasking/omp_record_replay_random_id.cpp
@@ -0,0 +1,47 @@
+// REQUIRES: ompx_taskgraph
+// RUN: %libomp-cxx-compile-and-run
+#include <iostream>
+#include <cassert>
+#define NT 10
+
+// Compiler-generated code (emulation)
+typedef struct ident {
+ void *dummy;
+} ident_t;
+
+#ifdef __cplusplus
+extern "C" {
+int __kmpc_global_thread_num(ident_t *);
+int __kmpc_start_record_task(ident_t *, int, int, int);
+void __kmpc_end_record_task(ident_t *, int, int, int);
+}
+#endif
+
+static void func(int *num_exec) { (*num_exec)++; }
+
+int main() {
+ int num_exec = 0;
+ int num_tasks = 0;
+ int hash_id = 135343854;
+#pragma omp parallel
+#pragma omp single
+ for (int iter = 0; iter < NT; ++iter) {
+ int gtid = __kmpc_global_thread_num(nullptr);
+ int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0,
+ /* tdg_id */ hash_id);
+ if (res) {
+ num_tasks++;
+#pragma omp task
+ func(&num_exec);
+ }
+ __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0,
+ /* tdg_id */ hash_id);
+ }
+
+ assert(num_tasks == 1);
+ assert(num_exec == NT);
+
+ std::cout << "Passed" << std::endl;
+ return 0;
+}
+// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_record_replay_reset.cpp b/openmp/runtime/test/tasking/omp_record_replay_reset.cpp
new file mode 100644
index 0000000000000..123a9fa5a72f0
--- /dev/null
+++ b/openmp/runtime/test/tasking/omp_record_replay_reset.cpp
@@ -0,0 +1,47 @@
+// REQUIRES: ompx_taskgraph
+// RUN: %libomp-cxx-compile-and-run
+#include <iostream>
+#include <cassert>
+#define NT 10
+
+// Compiler-generated code (emulation)
+typedef struct ident {
+ void *dummy;
+} ident_t;
+
+#ifdef __cplusplus
+extern "C" {
+int __kmpc_global_thread_num(ident_t *);
+int __kmpc_start_record_task(ident_t *, int, int, int);
+void __kmpc_end_record_task(ident_t *, int, int, int);
+}
+#endif
+
+static void func(int *num_exec) { (*num_exec)++; }
+
+int main() {
+ int num_exec = 0;
+ int num_tasks = 0;
+ int flags = 1 << 2;
+#pragma omp parallel
+#pragma omp single
+ for (int iter = 0; iter < NT; ++iter) {
+ int gtid = __kmpc_global_thread_num(nullptr);
+ int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */ flags,
+ /* tdg_id */ 0);
+ if (res) {
+ num_tasks++;
+#pragma omp task
+ func(&num_exec);
+ }
+ __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0,
+ /* tdg_id */ 0);
+ }
+
+ assert(num_tasks == NT);
+ assert(num_exec == NT);
+
+ std::cout << "Passed" << std::endl;
+ return 0;
+}
+// CHECK: Passed
>From 4b59915fd69769e2dd805f381f9786fd4beab6aa Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Wed, 25 Mar 2026 16:55:42 -0500
Subject: [PATCH 25/28] [OpenMP] OpenMP 6.0 "taskgraph" support, runtime parts
---
openmp/runtime/src/kmp.h | 279 ++-
openmp/runtime/src/kmp_debug.h | 14 +
openmp/runtime/src/kmp_global.cpp | 12 +-
openmp/runtime/src/kmp_settings.cpp | 34 +-
openmp/runtime/src/kmp_taskdeps.cpp | 3262 ++++++++++++++++++++++++---
openmp/runtime/src/kmp_taskdeps.h | 48 +-
openmp/runtime/src/kmp_tasking.cpp | 1405 +++++++-----
7 files changed, 4116 insertions(+), 938 deletions(-)
diff --git a/openmp/runtime/src/kmp.h b/openmp/runtime/src/kmp.h
index 73ad70444ec22..7c0a7ad58d861 100644
--- a/openmp/runtime/src/kmp.h
+++ b/openmp/runtime/src/kmp.h
@@ -2510,6 +2510,11 @@ typedef struct kmp_task { /* GEH: Shouldn't this be aligned somehow? */
@}
*/
+typedef struct kmp_taskgraph_reduce_input_data {
+ void *reduce_data;
+ kmp_int32 reduce_num_data;
+} kmp_taskgraph_reduce_input_data_t;
+
typedef struct kmp_taskgroup {
std::atomic<kmp_int32> count; // number of allocated and incomplete tasks
std::atomic<kmp_int32>
@@ -2519,6 +2524,15 @@ typedef struct kmp_taskgroup {
void *reduce_data; // reduction related info
kmp_int32 reduce_num_data; // number of data items to reduce
uintptr_t *gomp_data; // gomp reduction data
+ struct {
+ // Points to taskgraph that tasks in this taskgroup are being recorded to.
+ std::atomic<struct kmp_taskgraph_record *> recording;
+ // Temporary holding place for input data for reductions for this taskgroup
+ // during taskgraph recording. This is passed over to the first
+ // kmp_taskgraph_node we encounter inside the taskgroup. We'll have to
+ // watch out for potential race conditions here.
+ kmp_taskgraph_reduce_input_data_t *reduce_input;
+ } taskgraph;
} kmp_taskgroup_t;
// forward declarations
@@ -2570,6 +2584,12 @@ struct kmp_depnode_list {
// Max number of mutexinoutset dependencies per node
#define MAX_MTX_DEPS 4
+struct kmp_taskgraph_node;
+struct kmp_taskgraph_region;
+struct kmp_taskgraph_record;
+
+struct kmp_bitset;
+
typedef struct kmp_base_depnode {
kmp_depnode_list_t *successors; /* used under lock */
kmp_task_t *task; /* non-NULL if depnode is active, used under lock */
@@ -2581,6 +2601,7 @@ typedef struct kmp_base_depnode {
#endif
std::atomic<kmp_int32> npredecessors;
std::atomic<kmp_int32> nrefs;
+ struct kmp_bitset *set_membership;
} kmp_base_depnode_t;
union KMP_ALIGN_CACHE kmp_depnode {
@@ -2595,7 +2616,10 @@ struct kmp_dephash_entry {
kmp_depnode_list_t *last_set;
kmp_depnode_list_t *prev_set;
kmp_uint8 last_flag;
- kmp_lock_t *mtx_lock; /* is referenced by depnodes w/mutexinoutset dep */
+ union {
+ kmp_lock_t *mtx_lock; /* is referenced by depnodes w/mutexinoutset dep */
+ kmp_int32 set_num;
+ };
kmp_dephash_entry_t *next_in_bucket;
};
@@ -2632,74 +2656,139 @@ typedef struct {
} kmp_event_t;
#if OMP_TASKGRAPH_EXPERIMENTAL
-// Initial number of allocated nodes while recording
-#define INIT_MAPSIZE 50
-
-typedef struct kmp_taskgraph_flags { /*This needs to be exactly 32 bits */
- unsigned nowait : 1;
- unsigned re_record : 1;
- unsigned graph_reset : 1; /* 1==discard taskgraph record, 0==use taskgraph
- record */
- unsigned reserved : 29;
-} kmp_taskgraph_flags_t;
-
-/// Represents a TDG node
-typedef struct kmp_node_info {
- kmp_task_t *task; // Pointer to the actual task
- kmp_int32 *successors; // Array of the succesors ids
- kmp_int32 nsuccessors; // Number of succesors of the node
- std::atomic<kmp_int32>
- npredecessors_counter; // Number of predessors on the fly
- kmp_int32 npredecessors; // Total number of predecessors
- kmp_int32 successors_size; // Number of allocated succesors ids
- kmp_taskdata_t *parent_task; // Parent implicit task
-} kmp_node_info_t;
-
-/// Represent a TDG's current status
-typedef enum kmp_tdg_status {
+
+/// Represent a task dependency graph's current status
+typedef enum kmp_taskgraph_status {
KMP_TDG_NONE = 0,
KMP_TDG_RECORDING = 1,
KMP_TDG_READY = 2
-} kmp_tdg_status_t;
-
-/// Structure that contains a TDG
-typedef struct kmp_tdg_info {
- kmp_int32 tdg_id; // Unique idenfifier of the TDG
- kmp_taskgraph_flags_t tdg_flags; // Flags related to a TDG
- kmp_int32 map_size; // Number of allocated TDG nodes
- kmp_int32 num_roots; // Number of roots tasks int the TDG
- kmp_int32 *root_tasks; // Array of tasks identifiers that are roots
- kmp_node_info_t *record_map; // Array of TDG nodes
- kmp_tdg_status_t tdg_status =
- KMP_TDG_NONE; // Status of the TDG (recording, ready...)
- std::atomic<kmp_int32> num_tasks; // Number of TDG nodes
- std::atomic<kmp_int32> tdg_task_id_next; // Task id of next node
- kmp_bootstrap_lock_t
- graph_lock; // Protect graph attributes when updated via taskloop_recur
- // Taskloop reduction related
- void *rec_taskred_data; // Data to pass to __kmpc_task_reduction_init or
- // __kmpc_taskred_init
- kmp_int32 rec_num_taskred;
-} kmp_tdg_info_t;
-
-extern int __kmp_tdg_dot;
-extern kmp_int32 __kmp_max_tdgs;
-extern kmp_tdg_info_t **__kmp_global_tdgs;
-extern kmp_tdg_info_t *__kmp_curr_tdg;
-extern kmp_int32 __kmp_successors_size;
-extern std::atomic<kmp_int32> __kmp_tdg_task_id;
-extern kmp_int32 __kmp_num_tdg;
+} kmp_taskgraph_status_t;
+
+enum kmp_taskgraph_mark {
+ TASKGRAPH_UNMARKED,
+ TASKGRAPH_TEMP_MARK,
+ TASKGRAPH_PERMANENT_MARK,
+ TASKGRAPH_COMBINED,
+ TASKGRAPH_DELETED
+};
+
+typedef struct kmp_taskgraph_region_dep {
+ struct kmp_taskgraph_region *region;
+ struct kmp_taskgraph_region_dep *next;
+} kmp_taskgraph_region_dep_t;
+
+typedef struct kmp_taskgraph_node {
+ kmp_task_t *task;
+ bool taskloop_task;
+ kmp_taskgraph_reduce_input_data_t *reduce_input;
+ union {
+ // Valid when KMP_TDG_RECORDING in parent taskgraph record.
+ struct {
+ kmp_depend_info_t *dep_list;
+ kmp_int32 ndeps;
+ // This is a control dependency. If not -1, it is the index of the
+ // taskgraph node which succeeds this one in an array of taskgraph nodes.
+ kmp_int32 cfg_successor;
+ } unresolved;
+
+ // Valid when KMP_TDG_READY in parent taskgraph record.
+ struct {
+ struct kmp_taskgraph_region *last_region;
+ kmp_int32 count;
+ } resolved;
+ } u;
+} kmp_taskgraph_node_t;
+
+enum kmp_taskgraph_region_type {
+ TASKGRAPH_REGION_ENTRY,
+ TASKGRAPH_REGION_EXIT,
+ TASKGRAPH_REGION_NODE,
+ TASKGRAPH_REGION_WAIT,
+ TASKGRAPH_REGION_PARALLEL,
+ TASKGRAPH_REGION_EXCLUSIVE,
+ TASKGRAPH_REGION_SEQUENTIAL,
+ TASKGRAPH_REGION_IRREDUCIBLE
+};
+
+typedef struct kmp_taskgraph_region {
+ struct kmp_taskgraph_record *owner;
+ // Initially, the lexical "next" region (which doesn't have to be a
+ // successor). Subsequently, a pointer to the next item in the worklist.
+ struct kmp_taskgraph_region *next;
+ // The parent taskgraph for this one. Initially nullptr.
+ struct kmp_taskgraph_region *parent;
+ kmp_taskgraph_region_dep_t *successors;
+ kmp_taskgraph_region_dep_t *predecessors;
+ // Only valid while building the exec descr structure. This could probably
+ // share storage with one of the other fields if we wanted to save space.
+ struct kmp_taskgraph_exec_descr *exec_descr;
+ // The next allocated block.
+ struct kmp_taskgraph_region *alloc_chain;
+ struct kmp_bitset *mutexset;
+ struct kmp_taskgraph_region *mutexset_parent;
+ // Pointer to reduction input data for the region. We only expect to see
+ // this on TASKGRAPH_REGION_PARALLEL regions.
+ kmp_taskgraph_reduce_input_data_t *reduce_input;
+ enum kmp_taskgraph_region_type type;
+ enum kmp_taskgraph_mark mark;
+ kmp_int32 timestamp;
+ kmp_int32 level;
+ union {
+ struct {
+ kmp_taskgraph_node_t *node;
+ struct kmp_taskgraph_region *next_instance;
+ } task;
+ struct {
+ struct kmp_taskgraph_region **children;
+ kmp_int32 num_children;
+ } inner;
+ };
+} kmp_taskgraph_region_t;
+
+typedef struct kmp_taskgraph_record {
+ std::atomic<kmp_taskgraph_status_t> status = KMP_TDG_NONE;
+ kmp_int32 gtid = 0;
+ kmp_int32 graph_id = 0;
+ // A lock that protects the record_map and num_tasks fields from being
+ // modified by multiple threads.
+ // For now, we also use this whilst the taskgraph is being replayed.
+ // This should be replaced with an invocation counter when we implement
+ // concurrent replay of the taskgraph from different threads.
+ kmp_lock_t map_lock;
+ kmp_taskgraph_node_t *record_map = nullptr;
+ kmp_int32 num_tasks = 0;
+ kmp_int32 nodes_allocated = 0;
+ kmp_taskgraph_region_t *root;
+ kmp_taskgraph_region_t *alloc_root;
+ kmp_taskgraph_region_dep_t *recycled_deps;
+ kmp_int32 num_mutexes;
+ struct kmp_taskgraph_exec_descr *exec_descrs;
+ kmp_size_t exec_descr_size;
+ kmp_lock_t replay_lock;
+ // We need a taskgroup structure to keep track of recorded tasks. This is
+ // set to TRUE if the user requested "nogroup" on the taskgraph directive
+ // (then we can avoid blocking at the end of the taskgraph region on replay,
+ // at least).
+ bool nogroup_taskgroup;
+ struct kmp_taskgraph_record *next = nullptr;
+} kmp_taskgraph_record_t;
+
+typedef struct kmp_taskgraph_exec_descr {
+ std::atomic<kmp_int32> npredecessors;
+ std::atomic<kmp_int32> nblocks;
+ kmp_taskgraph_region_t *region;
+ struct kmp_taskgraph_exec_descr *sibling;
+ struct kmp_taskgraph_exec_descr *predecessor_chain;
+ struct kmp_taskgraph_exec_descr *successor;
+ struct kmp_taskgraph_exec_descr *next_instance;
+} kmp_taskgraph_exec_descr_t;
+
#endif
typedef struct kmp_tasking_flags { /* Total struct must be exactly 32 bits */
#if defined(__BYTE_ORDER__) && (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
/* Same fields as in the #else branch, but in reverse order */
-#if OMP_TASKGRAPH_EXPERIMENTAL
- unsigned reserved31 : 4;
- unsigned onced : 1;
-#else
unsigned reserved31 : 5;
-#endif
unsigned hidden_helper : 1;
unsigned target : 1;
unsigned native : 1;
@@ -2755,19 +2844,16 @@ typedef struct kmp_tasking_flags { /* Total struct must be exactly 32 bits */
unsigned native : 1; /* 1==gcc-compiled task, 0==intel */
unsigned target : 1;
unsigned hidden_helper : 1; /* 1 == hidden helper task */
-#if OMP_TASKGRAPH_EXPERIMENTAL
- unsigned onced : 1; /* 1==ran once already, 0==never ran, record & replay purposes */
- unsigned reserved31 : 4; /* reserved for library use */
-#else
unsigned reserved31 : 5; /* reserved for library use */
#endif
-#endif
} kmp_tasking_flags_t;
typedef struct kmp_target_data {
void *async_handle; // libomptarget async handle for task completion query
} kmp_target_data_t;
+struct kmp_taskgraph_exec_descr;
+
struct kmp_taskdata { /* aligned during dynamic allocation */
kmp_int32 td_task_id; /* id, assigned by debugger */
kmp_tasking_flags_t td_flags; /* task flags */
@@ -2811,9 +2897,9 @@ struct kmp_taskdata { /* aligned during dynamic allocation */
ompt_task_info_t ompt_task_info;
#endif
#if OMP_TASKGRAPH_EXPERIMENTAL
- bool is_taskgraph = 0; // whether the task is within a TDG
- kmp_tdg_info_t *tdg; // used to associate task with a TDG
- kmp_int32 td_tdg_task_id; // local task id in its TDG
+ // Whether the task is within a task dependency graph.
+ struct kmp_taskgraph_record *owning_taskgraph = nullptr;
+ struct kmp_taskgraph_exec_descr *exec_descr = nullptr;
#endif
kmp_target_data_t td_target_data;
}; // struct kmp_taskdata
@@ -3041,6 +3127,7 @@ typedef struct KMP_ALIGN_CACHE kmp_base_info {
kmp_uint8 th_task_state; // alternating 0/1 for task team identification
kmp_uint32 th_reap_state; // Non-zero indicates thread is not
// tasking, thus safe to reap
+ //kmp_taskgraph_record_t *th_taskgraph_recording;
/* More stuff for keeping track of active/sleeping threads (this part is
written by the worker thread) */
@@ -3326,6 +3413,7 @@ extern int kmp_c_debug;
extern int kmp_d_debug;
extern int kmp_e_debug;
extern int kmp_f_debug;
+extern int kmp_g_debug;
#endif /* KMP_DEBUG */
/* For debug information logging using rotating buffer */
@@ -4320,6 +4408,15 @@ KMP_EXPORT void __kmpc_omp_taskwait_deps_51(ident_t *loc_ref, kmp_int32 gtid,
extern kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
bool serialize_immediate);
+extern kmp_int32 __kmp_build_taskgraph(kmp_int32 gtid,
+ kmp_taskdata_t *current_taskdata,
+ kmp_taskgraph_record_t *taskgraph);
+
+extern void __kmp_replay_taskgraph(kmp_int32 gtid,
+ kmp_taskdata_t *current_taskdata,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_uint32 graph_id);
+
KMP_EXPORT kmp_int32 __kmpc_cancel(ident_t *loc_ref, kmp_int32 gtid,
kmp_int32 cncl_kind);
KMP_EXPORT kmp_int32 __kmpc_cancellationpoint(ident_t *loc_ref, kmp_int32 gtid,
@@ -4387,24 +4484,40 @@ KMP_EXPORT void __kmpc_init_lock_with_hint(ident_t *loc, kmp_int32 gtid,
KMP_EXPORT void __kmpc_init_nest_lock_with_hint(ident_t *loc, kmp_int32 gtid,
void **user_lock,
uintptr_t hint);
-
#if OMP_TASKGRAPH_EXPERIMENTAL
-// Taskgraph's Record & Replay mechanism
-// __kmp_tdg_is_recording: check whether a given TDG is recording
-// status: the tdg's current status
-static inline bool __kmp_tdg_is_recording(kmp_tdg_status_t status) {
- return status == KMP_TDG_RECORDING;
-}
-
-KMP_EXPORT kmp_int32 __kmpc_start_record_task(ident_t *loc, kmp_int32 gtid,
- kmp_int32 input_flags,
- kmp_int32 tdg_id);
-KMP_EXPORT void __kmpc_end_record_task(ident_t *loc, kmp_int32 gtid,
- kmp_int32 input_flags, kmp_int32 tdg_id);
KMP_EXPORT void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
- kmp_int32 input_flags, kmp_uint32 tdg_id,
- kmp_uint32 graph_id, void (*entry)(void *),
+ std::atomic<void*> *tdg_handle,
+ kmp_uint32 graph_id, kmp_int32 graph_reset,
+ kmp_int32 nogroup, void (*entry)(void *),
void *args);
+KMP_EXPORT kmp_uint32 __kmpc_taskgraph_task(ident_t *loc_ref, kmp_int32 gtid,
+ kmp_task_t *new_task,
+ kmp_int32 flags,
+ size_t sizeof_kmp_task_t,
+ void* shareds,
+ size_t sizeof_shareds,
+ kmp_int32 ndeps,
+ kmp_depend_info_t *dep_list);
+KMP_EXPORT kmp_uint32 __kmpc_taskgraph_taskloop(ident_t *loc_ref,
+ kmp_int32 gtid,
+ kmp_task_t *new_task,
+ kmp_int32 flags,
+ size_t sizeof_kmp_task_t,
+ void *shareds,
+ size_t sizeof_shareds,
+ kmp_int32 if_val,
+ kmp_uint64 *lb, kmp_uint64 *ub,
+ kmp_int64 st, kmp_int32 nogroup,
+ kmp_int32 sched,
+ kmp_uint64 grainsize,
+ kmp_int32 modifier,
+ void *task_dup);
+KMP_EXPORT void __kmpc_taskgraph_taskwait(ident_t *loc_ref, kmp_int32 gtid,
+ kmp_int32 ndeps,
+ kmp_depend_info_t *dep_list,
+ kmp_int32 has_no_wait);
+KMP_EXPORT void* __kmpc_taskgraph_taskred_init(kmp_int32 gtid, kmp_int32 num,
+ void *data);
#endif
/* Interface to fast scalable reduce methods routines */
diff --git a/openmp/runtime/src/kmp_debug.h b/openmp/runtime/src/kmp_debug.h
index 08d52cc04a108..5a317d5fa67a5 100644
--- a/openmp/runtime/src/kmp_debug.h
+++ b/openmp/runtime/src/kmp_debug.h
@@ -76,6 +76,7 @@ extern int kmp_c_debug;
extern int kmp_d_debug;
extern int kmp_e_debug;
extern int kmp_f_debug;
+extern int kmp_g_debug;
extern int kmp_diag;
#define KA_TRACE(d, x) \
@@ -102,6 +103,10 @@ extern int kmp_diag;
if (kmp_f_debug >= d) { \
__kmp_debug_printf x; \
}
+#define KG_TRACE(d, x) \
+ if (kmp_g_debug >= d) { \
+ __kmp_debug_printf x; \
+ }
#define K_DIAG(d, x) \
{ \
if (kmp_diag == d) { \
@@ -151,6 +156,13 @@ extern int kmp_diag;
(x); \
__kmp_enable(ks); \
}
+#define KG_DUMP(d, x) \
+ if (kmp_g_debug >= d) { \
+ int ks; \
+ __kmp_disable(&ks); \
+ (x); \
+ __kmp_enable(ks); \
+ }
#else
@@ -160,6 +172,7 @@ extern int kmp_diag;
#define KD_TRACE(d, x) /* nothing to do */
#define KE_TRACE(d, x) /* nothing to do */
#define KF_TRACE(d, x) /* nothing to do */
+#define KG_TRACE(d, x) /* nothing to do */
#define K_DIAG(d, x) \
{} /* nothing to do */
@@ -169,6 +182,7 @@ extern int kmp_diag;
#define KD_DUMP(d, x) /* nothing to do */
#define KE_DUMP(d, x) /* nothing to do */
#define KF_DUMP(d, x) /* nothing to do */
+#define KG_DUMP(d, x) /* nothing to do */
#endif // KMP_DEBUG
diff --git a/openmp/runtime/src/kmp_global.cpp b/openmp/runtime/src/kmp_global.cpp
index bd089d6f0bc3f..083f223e15d65 100644
--- a/openmp/runtime/src/kmp_global.cpp
+++ b/openmp/runtime/src/kmp_global.cpp
@@ -375,6 +375,7 @@ int kmp_c_debug = 0;
int kmp_d_debug = 0;
int kmp_e_debug = 0;
int kmp_f_debug = 0;
+int kmp_g_debug = 0;
int kmp_diag = 0;
#endif
@@ -553,16 +554,5 @@ int __kmp_nesting_mode = 0;
int __kmp_nesting_mode_nlevels = 1;
int *__kmp_nesting_nth_level;
-#if OMP_TASKGRAPH_EXPERIMENTAL
-// TDG record & replay
-int __kmp_tdg_dot = 0;
-kmp_int32 __kmp_max_tdgs = 100;
-kmp_tdg_info_t **__kmp_global_tdgs = NULL;
-kmp_tdg_info_t *__kmp_curr_tdg = NULL; // Current TDG being recorded or executed
-kmp_int32 __kmp_num_tdg = 0;
-kmp_int32 __kmp_successors_size = 10; // Initial succesor size list for
- // recording
-std::atomic<kmp_int32> __kmp_tdg_task_id = 0;
-#endif
// end of file //
diff --git a/openmp/runtime/src/kmp_settings.cpp b/openmp/runtime/src/kmp_settings.cpp
index 66ef6f8097dce..569bbe7aeaba3 100644
--- a/openmp/runtime/src/kmp_settings.cpp
+++ b/openmp/runtime/src/kmp_settings.cpp
@@ -1266,28 +1266,6 @@ static void __kmp_stg_parse_num_threads(char const *name, char const *value,
K_DIAG(1, ("__kmp_dflt_team_nth == %d\n", __kmp_dflt_team_nth));
} // __kmp_stg_parse_num_threads
-#if OMP_TASKGRAPH_EXPERIMENTAL
-static void __kmp_stg_parse_max_tdgs(char const *name, char const *value,
- void *data) {
- __kmp_stg_parse_int(name, value, 0, INT_MAX, &__kmp_max_tdgs);
-} // __kmp_stg_parse_max_tdgs
-
-static void __kmp_std_print_max_tdgs(kmp_str_buf_t *buffer, char const *name,
- void *data) {
- __kmp_stg_print_int(buffer, name, __kmp_max_tdgs);
-} // __kmp_std_print_max_tdgs
-
-static void __kmp_stg_parse_tdg_dot(char const *name, char const *value,
- void *data) {
- __kmp_stg_parse_bool(name, value, &__kmp_tdg_dot);
-} // __kmp_stg_parse_tdg_dot
-
-static void __kmp_stg_print_tdg_dot(kmp_str_buf_t *buffer, char const *name,
- void *data) {
- __kmp_stg_print_bool(buffer, name, __kmp_tdg_dot);
-} // __kmp_stg_print_tdg_dot
-#endif
-
static void __kmp_stg_parse_num_hidden_helper_threads(char const *name,
char const *value,
void *data) {
@@ -1579,6 +1557,7 @@ KMP_STG_X_DEBUG(c)
KMP_STG_X_DEBUG(d)
KMP_STG_X_DEBUG(e)
KMP_STG_X_DEBUG(f)
+KMP_STG_X_DEBUG(g)
#undef KMP_STG_X_DEBUG
@@ -1604,6 +1583,9 @@ static void __kmp_stg_parse_debug(char const *name, char const *value,
if (kmp_f_debug < debug) {
kmp_f_debug = debug;
}
+ if (kmp_g_debug < debug) {
+ kmp_g_debug = debug;
+ }
} // __kmp_stg_parse_debug
static void __kmp_stg_parse_debug_buf(char const *name, char const *value,
@@ -5590,6 +5572,8 @@ static kmp_setting_t __kmp_stg_table[] = {
0},
{"KMP_F_DEBUG", __kmp_stg_parse_f_debug, __kmp_stg_print_f_debug, NULL, 0,
0},
+ {"KMP_G_DEBUG", __kmp_stg_parse_g_debug, __kmp_stg_print_g_debug, NULL, 0,
+ 0},
{"KMP_DEBUG", __kmp_stg_parse_debug, NULL, /* no print */ NULL, 0, 0},
{"KMP_DEBUG_BUF", __kmp_stg_parse_debug_buf, __kmp_stg_print_debug_buf,
NULL, 0, 0},
@@ -5742,12 +5726,6 @@ static kmp_setting_t __kmp_stg_table[] = {
{"LIBOMP_NUM_HIDDEN_HELPER_THREADS",
__kmp_stg_parse_num_hidden_helper_threads,
__kmp_stg_print_num_hidden_helper_threads, NULL, 0, 0},
-#if OMP_TASKGRAPH_EXPERIMENTAL
- {"KMP_MAX_TDGS", __kmp_stg_parse_max_tdgs, __kmp_std_print_max_tdgs, NULL,
- 0, 0},
- {"KMP_TDG_DOT", __kmp_stg_parse_tdg_dot, __kmp_stg_print_tdg_dot, NULL, 0,
- 0},
-#endif
#if OMPT_SUPPORT
{"OMP_TOOL", __kmp_stg_parse_omp_tool, __kmp_stg_print_omp_tool, NULL, 0,
diff --git a/openmp/runtime/src/kmp_taskdeps.cpp b/openmp/runtime/src/kmp_taskdeps.cpp
index b34191036c528..1f28f747c0a80 100644
--- a/openmp/runtime/src/kmp_taskdeps.cpp
+++ b/openmp/runtime/src/kmp_taskdeps.cpp
@@ -20,6 +20,11 @@
#include "ompt-specific.h"
#endif
+#include <bit>
+#include <cstdlib>
+#include <algorithm>
+#include <cinttypes>
+
// TODO: Improve memory allocation? keep a list of pre-allocated structures?
// allocate in blocks? re-use list finished list entries?
// TODO: don't use atomic ref counters for stack-allocated nodes.
@@ -33,6 +38,14 @@
static std::atomic<kmp_int32> kmp_node_id_seed = 0;
#endif
+#undef DEBUG_TASKGRAPH
+
+#ifdef DEBUG_TASKGRAPH
+#define TGDBG(ARGS...) fprintf(stderr, ARGS)
+#else
+#define TGDBG(ARGS...)
+#endif
+
static void __kmp_init_node(kmp_depnode_t *node, bool on_stack) {
node->dn.successors = NULL;
node->dn.task = NULL; // will point to the right task
@@ -49,6 +62,7 @@ static void __kmp_init_node(kmp_depnode_t *node, bool on_stack) {
#ifdef KMP_SUPPORT_GRAPH_OUTPUT
node->dn.id = KMP_ATOMIC_INC(&kmp_node_id_seed);
#endif
+ node->dn.set_membership = nullptr;
#if USE_ITT_BUILD && USE_ITT_NOTIFY
__itt_sync_create(node, "OMP task dep node", NULL, 0);
#endif
@@ -160,7 +174,8 @@ static kmp_dephash_t *__kmp_dephash_create(kmp_info_t *thread,
static kmp_dephash_entry *__kmp_dephash_find(kmp_info_t *thread,
kmp_dephash_t **hash,
- kmp_intptr_t addr) {
+ kmp_intptr_t addr,
+ bool taskgraph_p) {
kmp_dephash_t *h = *hash;
if (h->nelements != 0 && h->nconflicts / h->size >= 1) {
*hash = __kmp_dephash_extend(thread, h);
@@ -190,7 +205,10 @@ static kmp_dephash_entry *__kmp_dephash_find(kmp_info_t *thread,
entry->last_set = NULL;
entry->prev_set = NULL;
entry->last_flag = 0;
- entry->mtx_lock = NULL;
+ if (taskgraph_p)
+ entry->set_num = -1;
+ else
+ entry->mtx_lock = NULL;
entry->next_in_bucket = h->buckets[bucket];
h->buckets[bucket] = entry;
h->nelements++;
@@ -200,6 +218,7 @@ static kmp_dephash_entry *__kmp_dephash_find(kmp_info_t *thread,
return entry;
}
+template <bool refcounting>
static kmp_depnode_list_t *__kmp_add_node(kmp_info_t *thread,
kmp_depnode_list_t *list,
kmp_depnode_t *node) {
@@ -213,7 +232,11 @@ static kmp_depnode_list_t *__kmp_add_node(kmp_info_t *thread,
thread, sizeof(kmp_depnode_list_t));
#endif
- new_head->node = __kmp_node_ref(node);
+ if (refcounting) {
+ new_head->node = __kmp_node_ref(node);
+ } else {
+ new_head->node = node;
+ }
new_head->next = list;
return new_head;
@@ -222,54 +245,6 @@ static kmp_depnode_list_t *__kmp_add_node(kmp_info_t *thread,
static inline void __kmp_track_dependence(kmp_int32 gtid, kmp_depnode_t *source,
kmp_depnode_t *sink,
kmp_task_t *sink_task) {
-#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_taskdata_t *task_source = KMP_TASK_TO_TASKDATA(source->dn.task);
- kmp_taskdata_t *task_sink = KMP_TASK_TO_TASKDATA(sink_task);
- kmp_tdg_info_t *tdg = task_source->tdg;
- if (source->dn.task && sink_task) {
- // Not supporting dependency between two tasks that one is within the TDG
- // and the other is not
- KMP_ASSERT(task_source->is_taskgraph == task_sink->is_taskgraph);
- }
- if (task_sink->is_taskgraph &&
- __kmp_tdg_is_recording(task_sink->tdg->tdg_status)) {
- kmp_node_info_t *source_info =
- &task_sink->tdg->record_map[task_source->td_tdg_task_id];
- bool exists = false;
- for (int i = 0; i < source_info->nsuccessors; i++) {
- if (source_info->successors[i] == task_sink->td_tdg_task_id) {
- exists = true;
- break;
- }
- }
- if (!exists) {
- __kmp_acquire_bootstrap_lock(&tdg->graph_lock);
- if (source_info->nsuccessors >= source_info->successors_size) {
- kmp_uint old_size = source_info->successors_size;
- source_info->successors_size = old_size == 0
- ? __kmp_successors_size
- : 2 * source_info->successors_size;
- kmp_int32 *old_succ_ids = source_info->successors;
- kmp_int32 *new_succ_ids = (kmp_int32 *)__kmp_allocate(
- source_info->successors_size * sizeof(kmp_int32));
- if (old_succ_ids) {
- KMP_MEMCPY(new_succ_ids, old_succ_ids, old_size * sizeof(kmp_int32));
- __kmp_free(old_succ_ids);
- }
- source_info->successors = new_succ_ids;
- }
-
- source_info->successors[source_info->nsuccessors] =
- task_sink->td_tdg_task_id;
- source_info->nsuccessors++;
-
- kmp_node_info_t *sink_info =
- &(task_sink->tdg->record_map[task_sink->td_tdg_task_id]);
- sink_info->npredecessors++;
- __kmp_release_bootstrap_lock(&tdg->graph_lock);
- }
- }
-#endif
#ifdef KMP_SUPPORT_GRAPH_OUTPUT
kmp_taskdata_t *task_source = KMP_TASK_TO_TASKDATA(source->dn.task);
// do not use sink->dn.task as that is only filled after the dependences
@@ -318,25 +293,13 @@ __kmp_depnode_link_successor(kmp_int32 gtid, kmp_info_t *thread,
// link node as successor of list elements
for (kmp_depnode_list_t *p = plist; p; p = p->next) {
kmp_depnode_t *dep = p->node;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_status tdg_status = KMP_TDG_NONE;
- if (task) {
- kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(task);
- if (td->is_taskgraph)
- tdg_status = KMP_TASK_TO_TASKDATA(task)->tdg->tdg_status;
- if (__kmp_tdg_is_recording(tdg_status))
- __kmp_track_dependence(gtid, dep, node, task);
- }
-#endif
if (dep->dn.task) {
KMP_ACQUIRE_DEPNODE(gtid, dep);
if (dep->dn.task) {
if (!dep->dn.successors || dep->dn.successors->node != node) {
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (!(__kmp_tdg_is_recording(tdg_status)) && task)
-#endif
- __kmp_track_dependence(gtid, dep, node, task);
- dep->dn.successors = __kmp_add_node(thread, dep->dn.successors, node);
+ __kmp_track_dependence(gtid, dep, node, task);
+ dep->dn.successors =
+ __kmp_add_node<true>(thread, dep->dn.successors, node);
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",
gtid, KMP_TASK_TO_TASKDATA(dep->dn.task),
@@ -359,44 +322,18 @@ static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
if (!sink)
return 0;
kmp_int32 npredecessors = 0;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_status tdg_status = KMP_TDG_NONE;
- kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(task);
- if (task) {
- if (td->is_taskgraph)
- tdg_status = KMP_TASK_TO_TASKDATA(task)->tdg->tdg_status;
- if (__kmp_tdg_is_recording(tdg_status) && sink->dn.task)
- __kmp_track_dependence(gtid, sink, source, task);
- }
-#endif
if (sink->dn.task) {
// synchronously add source to sink' list of successors
KMP_ACQUIRE_DEPNODE(gtid, sink);
if (sink->dn.task) {
if (!sink->dn.successors || sink->dn.successors->node != source) {
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (!(__kmp_tdg_is_recording(tdg_status)) && task)
-#endif
- __kmp_track_dependence(gtid, sink, source, task);
- sink->dn.successors = __kmp_add_node(thread, sink->dn.successors, source);
+ __kmp_track_dependence(gtid, sink, source, task);
+ sink->dn.successors = __kmp_add_node<true>(thread, sink->dn.successors, source);
KA_TRACE(40, ("__kmp_process_deps: T#%d adding dependence from %p to "
"%p\n",
gtid, KMP_TASK_TO_TASKDATA(sink->dn.task),
KMP_TASK_TO_TASKDATA(task)));
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (__kmp_tdg_is_recording(tdg_status)) {
- kmp_taskdata_t *tdd = KMP_TASK_TO_TASKDATA(sink->dn.task);
- if (tdd->is_taskgraph) {
- if (tdd->td_flags.onced)
- // decrement npredecessors if sink->dn.task belongs to a taskgraph
- // and
- // 1) the task is reset to its initial state (by kmp_free_task) or
- // 2) the task is complete but not yet reset
- npredecessors--;
- }
- }
-#endif
- npredecessors++;
+ npredecessors++;
}
}
KMP_RELEASE_DEPNODE(gtid, sink);
@@ -404,21 +341,251 @@ static inline kmp_int32 __kmp_depnode_link_successor(kmp_int32 gtid,
return npredecessors;
}
+kmp_taskgraph_region_dep_t *__kmp_region_deplist_add(kmp_info_t *thread,
+ kmp_taskgraph_region_dep_t **recycled_deps, kmp_taskgraph_region_t *region,
+ kmp_taskgraph_region_dep_t *list) {
+ kmp_taskgraph_region_dep_t *head;
+ if (*recycled_deps) {
+ head = *recycled_deps;
+ *recycled_deps = (*recycled_deps)->next;
+ } else
+ head = (kmp_taskgraph_region_dep_t *)__kmp_fast_allocate(thread, sizeof(kmp_taskgraph_region_dep_t));
+ head->region = region;
+ head->next = list;
+ return head;
+}
+
+kmp_taskgraph_region_t *__kmp_region_worklist_reverse(kmp_taskgraph_region_t *list) {
+ kmp_taskgraph_region_t *last = nullptr;
+ while (list) {
+ kmp_taskgraph_region_t *next = list->next;
+ list->next = last;
+ last = list;
+ list = next;
+ }
+ return last;
+}
+
+static kmp_depnode_t *__kmp_find_in_depnode_list(kmp_depnode_t *node, kmp_depnode_list_t *list) {
+ for (; list; list = list->next)
+ if (list->node == node)
+ return list->node;
+ return nullptr;
+}
+
+// A trivial fixed-size bitset implementation.
+
+typedef struct kmp_bitset {
+ kmp_uint64 *bits;
+ kmp_size_t bitsize;
+ kmp_size_t num_chunks;
+} kmp_bitset_t;
+
+static kmp_bitset_t *
+__kmp_bitset_alloc(kmp_info_t *thread, kmp_size_t bitsize) {
+ kmp_size_t bytesize = (bitsize + 7) / 8;
+ kmp_size_t num_chunks = (bytesize + sizeof(kmp_uint64) - 1) / sizeof(kmp_uint64);
+ kmp_bitset_t *bitset = (kmp_bitset_t *) __kmp_fast_allocate(thread, sizeof(kmp_bitset_t) + sizeof(kmp_uint64) * num_chunks);
+ bitset->bits = (kmp_uint64*) &bitset[1];
+ memset(bitset->bits, 0, sizeof(kmp_uint64) * num_chunks);
+ bitset->bitsize = bitsize;
+ bitset->num_chunks = num_chunks;
+ return bitset;
+}
+
+static void
+__kmp_bitset_free(kmp_info_t *thread, kmp_bitset_t *bitset) {
+ __kmp_fast_free(thread, bitset);
+}
+
+static void
+__kmp_bitset_set(kmp_bitset_t *bitset, kmp_size_t bitnum) {
+ kmp_size_t chunk = bitnum / (8 * sizeof(kmp_uint64));
+ if (bitnum < bitset->bitsize)
+ bitset->bits[chunk] |= (kmp_uint64)1 << (bitnum & 63);
+}
+
+static void
+__kmp_bitset_clearall(kmp_bitset_t *bitset) {
+ if (bitset)
+ memset(bitset->bits, 0, sizeof(kmp_int64) * bitset->num_chunks);
+}
+
+static void
+__kmp_bitset_setall(kmp_bitset_t *bitset) {
+ for (kmp_int32 chunk = 0; chunk < bitset->num_chunks - 1; chunk++)
+ bitset->bits[chunk] = ~(kmp_uint64)0;
+ kmp_int32 last_chunk_numbits = bitset->bitsize & 63;
+ if (last_chunk_numbits > 0) {
+ kmp_uint64 last_chunk_bits = ~((~(kmp_uint64)0) << last_chunk_numbits);
+ bitset->bits[bitset->num_chunks - 1] = last_chunk_bits;
+ }
+}
+
+static void
+__kmp_bitset_copy(kmp_bitset_t *dst, const kmp_bitset_t *src) {
+ assert(dst->num_chunks == src->num_chunks);
+ assert(dst->bitsize == src->bitsize);
+ memcpy(dst->bits, src->bits, sizeof(kmp_uint64) * dst->num_chunks);
+}
+
+/// Return TRUE if \c b is a subset of \c a.
+
+static bool
+__kmp_bitset_subset_p(const kmp_bitset_t *a, const kmp_bitset_t *b) {
+ if (!b)
+ return true;
+ kmp_size_t chunk_max = std::max(a->num_chunks, b->num_chunks);
+ for (kmp_size_t chunk = 0; chunk < chunk_max; chunk++) {
+ kmp_uint64 a_bits = chunk < a->num_chunks ? a->bits[chunk] : 0;
+ kmp_uint64 b_bits = chunk < b->num_chunks ? b->bits[chunk] : 0;
+ if ((a_bits & b_bits) != b_bits)
+ return false;
+ }
+ return true;
+}
+
+static void
+__kmp_bitset_and(kmp_bitset_t *a, kmp_bitset_t *b, kmp_bitset_t *c) {
+ kmp_size_t chunk_max = std::max(b->num_chunks, c->num_chunks);
+ for (kmp_size_t chunk = 0; chunk < chunk_max; chunk++) {
+ kmp_uint64 b_bits = chunk < b->num_chunks ? b->bits[chunk] : 0;
+ kmp_uint64 c_bits = chunk < c->num_chunks ? c->bits[chunk] : 0;
+ a->bits[chunk] = b_bits & c_bits;
+ }
+}
+
+static void
+__kmp_bitset_and_not(kmp_bitset_t *a, kmp_bitset_t *b, kmp_bitset_t *c) {
+ if (!c)
+ __kmp_bitset_copy(a, b);
+ else {
+ kmp_size_t chunk_max = std::max(b->num_chunks, c->num_chunks);
+ for (kmp_size_t chunk = 0; chunk < chunk_max; chunk++) {
+ kmp_uint64 b_bits = chunk < b->num_chunks ? b->bits[chunk] : 0;
+ kmp_uint64 c_bits = chunk < c->num_chunks ? c->bits[chunk] : 0;
+ a->bits[chunk] = b_bits & ~c_bits;
+ }
+ }
+}
+
+static void
+__kmp_bitset_or(kmp_bitset_t *a, kmp_bitset_t *b, kmp_bitset_t *c) {
+ if (!b && !c)
+ __kmp_bitset_clearall(a);
+ else if (!b)
+ __kmp_bitset_copy(a, c);
+ else if (!c)
+ __kmp_bitset_copy(a, b);
+ else {
+ kmp_size_t chunk_max = std::max(b->num_chunks, c->num_chunks);
+ for (kmp_size_t chunk = 0; chunk < chunk_max; chunk++) {
+ kmp_uint64 b_bits = chunk < b->num_chunks ? b->bits[chunk] : 0;
+ kmp_uint64 c_bits = chunk < c->num_chunks ? c->bits[chunk] : 0;
+ a->bits[chunk] = b_bits | c_bits;
+ }
+ }
+}
+
+static bool
+__kmp_bitset_empty_p(kmp_bitset_t *bitset) {
+ if (!bitset)
+ return true;
+ for (kmp_size_t chunk = 0; chunk < bitset->num_chunks; chunk++) {
+ if (bitset->bits[chunk] != 0)
+ return false;
+ }
+ return true;
+}
+
+/// Test two bitsets for equality. Note that any unused bits at the end of the
+/// last chunk are kept as zero.
+
+static bool
+__kmp_bitset_equal(kmp_bitset_t *a, kmp_bitset_t *b) {
+ if (!b)
+ return __kmp_bitset_empty_p(a);
+ kmp_size_t chunk_max = std::max(a->num_chunks, b->num_chunks);
+ for (kmp_size_t chunk = 0; chunk < chunk_max; chunk++) {
+ kmp_uint64 a_bits = chunk < a->num_chunks ? a->bits[chunk] : 0;
+ kmp_uint64 b_bits = chunk < b->num_chunks ? b->bits[chunk] : 0;
+ if (a_bits != b_bits)
+ return false;
+ }
+ return true;
+}
+
+static bool
+__kmp_bitset_intersect_p(kmp_bitset_t *a, kmp_bitset_t *b) {
+ if (!a || !b)
+ return false;
+ kmp_size_t chunk_max = std::max(a->num_chunks, b->num_chunks);
+ for (kmp_size_t chunk = 0; chunk < chunk_max; chunk++) {
+ kmp_uint64 a_bits = chunk < a->num_chunks ? a->bits[chunk] : 0;
+ kmp_uint64 b_bits = chunk < b->num_chunks ? b->bits[chunk] : 0;
+ if ((a_bits & b_bits) != 0)
+ return true;
+ }
+ return false;
+}
+
+static kmp_int32
+__kmp_bitset_popcount(kmp_bitset_t *bitset) {
+ if (!bitset)
+ return 0;
+ kmp_int32 accum = 0;
+ for (kmp_int32 c = 0; c < bitset->num_chunks; c++) {
+ accum += std::__popcount(bitset->bits[c]);
+ }
+ return accum;
+}
+
+static kmp_int32 __kmp_taskgraph_add_dep(kmp_info_t *thread,
+ kmp_depnode_t *node,
+ kmp_depnode_list_t *plist) {
+ kmp_int32 npredecessors = 0;
+ for (; plist; plist = plist->next) {
+ kmp_depnode_t *dep = plist->node;
+ if (!dep->dn.successors || !__kmp_find_in_depnode_list(node, dep->dn.successors)) {
+ dep->dn.successors =
+ __kmp_add_node<false>(thread, dep->dn.successors, node);
+ npredecessors++;
+ }
+ }
+ return npredecessors;
+}
+
+static kmp_int32 __kmp_taskgraph_add_dep(kmp_info_t *thread,
+ kmp_depnode_t *source,
+ kmp_depnode_t *sink) {
+ if (!sink)
+ return 0;
+ kmp_int32 npredecessors = 0;
+ if (!sink->dn.successors || sink->dn.successors->node != source) {
+ if (!__kmp_find_in_depnode_list(source, sink->dn.successors)) {
+ sink->dn.successors = __kmp_add_node<false>(thread, sink->dn.successors,
+ source);
+ npredecessors++;
+ }
+ }
+ return npredecessors;
+}
+
+template<typename T>
static inline kmp_int32
__kmp_process_dep_all(kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t *h,
bool dep_barrier, kmp_task_t *task) {
- KA_TRACE(30, ("__kmp_process_dep_all: T#%d processing dep_all, "
- "dep_barrier = %d\n",
+ KA_TRACE(30, ("__kmp_process_dep_all<%s>: T#%d processing dep_all, "
+ "dep_barrier = %d\n", T::name,
gtid, dep_barrier));
kmp_info_t *thread = __kmp_threads[gtid];
kmp_int32 npredecessors = 0;
// process previous omp_all_memory node if any
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, h->last_all);
- __kmp_node_deref(thread, h->last_all);
+ npredecessors += T::link_successor(gtid, thread, task, node, h->last_all);
+ T::deref(thread, h->last_all);
if (!dep_barrier) {
- h->last_all = __kmp_node_ref(node);
+ h->last_all = T::ref(node);
} else {
// if this is a sync point in the serial sequence, then the previous
// outputs are guaranteed to be completed after the execution of this
@@ -437,38 +604,37 @@ __kmp_process_dep_all(kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t *h,
kmp_depnode_list_t *last_set = info->last_set;
kmp_depnode_list_t *prev_set = info->prev_set;
if (last_set) {
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, last_set);
- __kmp_depnode_list_free(thread, last_set);
- __kmp_depnode_list_free(thread, prev_set);
+ npredecessors += T::link_successor(gtid, thread, task, node, last_set);
+ __kmp_depnode_list_free<T::rc>(thread, last_set);
+ __kmp_depnode_list_free<T::rc>(thread, prev_set);
info->last_set = NULL;
info->prev_set = NULL;
info->last_flag = 0; // no sets in this dephash entry
} else {
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, last_out);
+ npredecessors += T::link_successor(gtid, thread, task, node, last_out);
}
- __kmp_node_deref(thread, last_out);
+ T::deref(thread, last_out);
if (!dep_barrier) {
- info->last_out = __kmp_node_ref(node);
+ info->last_out = T::ref(node);
} else {
info->last_out = NULL;
}
}
}
- KA_TRACE(30, ("__kmp_process_dep_all: T#%d found %d predecessors\n", gtid,
- npredecessors));
+ KA_TRACE(30, ("__kmp_process_dep_all<%s>: T#%d found %d predecessors\n",
+ T::name, gtid, npredecessors));
return npredecessors;
}
-template <bool filter>
+template<typename T>
static inline kmp_int32
__kmp_process_deps(kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t **hash,
bool dep_barrier, kmp_int32 ndeps,
- kmp_depend_info_t *dep_list, kmp_task_t *task) {
- KA_TRACE(30, ("__kmp_process_deps<%d>: T#%d processing %d dependences : "
- "dep_barrier = %d\n",
- filter, gtid, ndeps, dep_barrier));
+ kmp_depend_info_t *dep_list, kmp_task_t *task,
+ kmp_int32 &next_mutex_set, bool filter = true) {
+ KA_TRACE(30, ("__kmp_process_deps<%s>: T#%d processing %d dependences : "
+ "dep_barrier = %d, filter = %d\n", T::name,
+ gtid, ndeps, dep_barrier, filter));
kmp_info_t *thread = __kmp_threads[gtid];
kmp_int32 npredecessors = 0;
@@ -479,27 +645,25 @@ __kmp_process_deps(kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t **hash,
continue; // skip filtered entries
kmp_dephash_entry_t *info =
- __kmp_dephash_find(thread, hash, dep->base_addr);
+ __kmp_dephash_find(thread, hash, dep->base_addr, !T::rc);
kmp_depnode_t *last_out = info->last_out;
kmp_depnode_list_t *last_set = info->last_set;
kmp_depnode_list_t *prev_set = info->prev_set;
if (dep->flags.out) { // out or inout --> clean lists if any
if (last_set) {
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, last_set);
- __kmp_depnode_list_free(thread, last_set);
- __kmp_depnode_list_free(thread, prev_set);
+ npredecessors += T::link_successor(gtid, thread, task, node, last_set);
+ __kmp_depnode_list_free<T::rc>(thread, last_set);
+ __kmp_depnode_list_free<T::rc>(thread, prev_set);
info->last_set = NULL;
info->prev_set = NULL;
info->last_flag = 0; // no sets in this dephash entry
} else {
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, last_out);
+ npredecessors += T::link_successor(gtid, thread, task, node, last_out);
}
- __kmp_node_deref(thread, last_out);
+ T::deref(thread, last_out);
if (!dep_barrier) {
- info->last_out = __kmp_node_ref(node);
+ info->last_out = T::ref(node);
} else {
// if this is a sync point in the serial sequence, then the previous
// outputs are guaranteed to be completed after the execution of this
@@ -510,27 +674,24 @@ __kmp_process_deps(kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t **hash,
if (info->last_flag == 0 || info->last_flag == dep->flag) {
// last_set either didn't exist or of same dep kind
// link node as successor of the last_out if any
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, last_out);
+ npredecessors += T::link_successor(gtid, thread, task, node, last_out);
// link node as successor of all nodes in the prev_set if any
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, prev_set);
+ npredecessors += T::link_successor(gtid, thread, task, node, prev_set);
if (dep_barrier) {
// clean last_out and prev_set if any; don't touch last_set
- __kmp_node_deref(thread, last_out);
+ T::deref(thread, last_out);
info->last_out = NULL;
- __kmp_depnode_list_free(thread, prev_set);
+ __kmp_depnode_list_free<T::rc>(thread, prev_set);
info->prev_set = NULL;
}
} else { // last_set is of different dep kind, make it prev_set
// link node as successor of all nodes in the last_set
- npredecessors +=
- __kmp_depnode_link_successor(gtid, thread, task, node, last_set);
+ npredecessors += T::link_successor(gtid, thread, task, node, last_set);
// clean last_out if any
- __kmp_node_deref(thread, last_out);
+ T::deref(thread, last_out);
info->last_out = NULL;
// clean prev_set if any
- __kmp_depnode_list_free(thread, prev_set);
+ __kmp_depnode_list_free<T::rc>(thread, prev_set);
if (!dep_barrier) {
// move last_set to prev_set, new last_set will be allocated
info->prev_set = last_set;
@@ -544,62 +705,130 @@ __kmp_process_deps(kmp_int32 gtid, kmp_depnode_t *node, kmp_dephash_t **hash,
// 0 if last_set is empty, unchanged otherwise
if (!dep_barrier) {
info->last_flag = dep->flag; // store dep kind of the last_set
- info->last_set = __kmp_add_node(thread, info->last_set, node);
+ info->last_set = __kmp_add_node<T::rc>(thread, info->last_set, node);
}
// check if we are processing MTX dependency
if (dep->flag == KMP_DEP_MTX) {
- if (info->mtx_lock == NULL) {
- info->mtx_lock = (kmp_lock_t *)__kmp_allocate(sizeof(kmp_lock_t));
- __kmp_init_lock(info->mtx_lock);
- }
- KMP_DEBUG_ASSERT(node->dn.mtx_num_locks < MAX_MTX_DEPS);
- kmp_int32 m;
- // Save lock in node's array
- for (m = 0; m < MAX_MTX_DEPS; ++m) {
- // sort pointers in decreasing order to avoid potential livelock
- if (node->dn.mtx_locks[m] < info->mtx_lock) {
- KMP_DEBUG_ASSERT(!node->dn.mtx_locks[node->dn.mtx_num_locks]);
- for (int n = node->dn.mtx_num_locks; n > m; --n) {
- // shift right all lesser non-NULL pointers
- KMP_DEBUG_ASSERT(node->dn.mtx_locks[n - 1] != NULL);
- node->dn.mtx_locks[n] = node->dn.mtx_locks[n - 1];
- }
- node->dn.mtx_locks[m] = info->mtx_lock;
- break;
- }
- }
- KMP_DEBUG_ASSERT(m < MAX_MTX_DEPS); // must break from loop
- node->dn.mtx_num_locks++;
+ T::mutex_dep(thread, info, node, next_mutex_set);
}
}
}
- KA_TRACE(30, ("__kmp_process_deps<%d>: T#%d found %d predecessors\n", filter,
- gtid, npredecessors));
+ KA_TRACE(30, ("__kmp_process_deps<%s>: T#%d found %d predecessors (filter: %d)\n",
+ T::name, gtid, npredecessors, filter));
return npredecessors;
}
-#define NO_DEP_BARRIER (false)
-#define DEP_BARRIER (true)
+struct normal_deps {
+ static constexpr char name[] = "normal";
+ static constexpr bool rc = true;
+ static kmp_int32 link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task, kmp_depnode_t *source,
+ kmp_depnode_t *sink);
+ static kmp_int32 link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task, kmp_depnode_t *node,
+ kmp_depnode_list_t *plist);
+ static kmp_depnode_t *ref(kmp_depnode_t *node);
+ static void deref(kmp_info_t *thread, kmp_depnode_t *node);
+ static void mutex_dep(kmp_info_t *thread, kmp_dephash_entry_t *info,
+ kmp_depnode_t *node, kmp_int32 &next_mutex_set);
+};
+
+kmp_int32 normal_deps::link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task, kmp_depnode_t *source,
+ kmp_depnode_t *sink) {
+ return __kmp_depnode_link_successor(gtid, thread, task, source, sink);
+}
-// returns true if the task has any outstanding dependence
-static bool __kmp_check_deps(kmp_int32 gtid, kmp_depnode_t *node,
- kmp_task_t *task, kmp_dephash_t **hash,
- bool dep_barrier, kmp_int32 ndeps,
- kmp_depend_info_t *dep_list,
- kmp_int32 ndeps_noalias,
- kmp_depend_info_t *noalias_dep_list) {
- int i, n_mtxs = 0, dep_all = 0;
-#if KMP_DEBUG
- kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
-#endif
- KA_TRACE(20, ("__kmp_check_deps: T#%d checking dependences for task %p : %d "
- "possibly aliased dependences, %d non-aliased dependences : "
- "dep_barrier=%d .\n",
- gtid, taskdata, ndeps, ndeps_noalias, dep_barrier));
+kmp_int32 normal_deps::link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task, kmp_depnode_t *node,
+ kmp_depnode_list_t *plist) {
+ return __kmp_depnode_link_successor(gtid, thread, task, node, plist);
+}
+
+kmp_depnode_t *normal_deps::ref(kmp_depnode_t *node) {
+ return __kmp_node_ref(node);
+}
+
+void normal_deps::deref(kmp_info_t *thread, kmp_depnode_t *node) {
+ __kmp_node_deref(thread, node);
+}
+
+void normal_deps::mutex_dep(kmp_info_t *thread, kmp_dephash_entry_t *info,
+ kmp_depnode_t *node, kmp_int32 &next_mutex_set) {
+ if (info->mtx_lock == NULL) {
+ info->mtx_lock = (kmp_lock_t *)__kmp_allocate(sizeof(kmp_lock_t));
+ __kmp_init_lock(info->mtx_lock);
+ }
+ KMP_DEBUG_ASSERT(node->dn.mtx_num_locks < MAX_MTX_DEPS);
+ kmp_int32 m;
+ // Save lock in node's array
+ for (m = 0; m < MAX_MTX_DEPS; ++m) {
+ // sort pointers in decreasing order to avoid potential livelock
+ if (node->dn.mtx_locks[m] < info->mtx_lock) {
+ KMP_DEBUG_ASSERT(!node->dn.mtx_locks[node->dn.mtx_num_locks]);
+ for (int n = node->dn.mtx_num_locks; n > m; --n) {
+ // shift right all lesser non-NULL pointers
+ KMP_DEBUG_ASSERT(node->dn.mtx_locks[n - 1] != NULL);
+ node->dn.mtx_locks[n] = node->dn.mtx_locks[n - 1];
+ }
+ node->dn.mtx_locks[m] = info->mtx_lock;
+ break;
+ }
+ }
+ KMP_DEBUG_ASSERT(m < MAX_MTX_DEPS); // must break from loop
+ node->dn.mtx_num_locks++;
+}
+
+struct taskgraph_deps {
+ static constexpr char name[] = "taskgraph";
+ static constexpr bool rc = false;
+ static kmp_int32 link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task, kmp_depnode_t *source,
+ kmp_depnode_t *sink);
+ static kmp_int32 link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task, kmp_depnode_t *node,
+ kmp_depnode_list_t *plist);
+ static kmp_depnode_t *ref(kmp_depnode_t *node) { return node; }
+ static void deref(kmp_info_t *thread, kmp_depnode_t *node) { }
+ static void mutex_dep(kmp_info_t *thread, kmp_dephash_entry_t *info,
+ kmp_depnode_t *node, kmp_int32 &next_mutex_set);
+};
+
+kmp_int32 taskgraph_deps::link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task,
+ kmp_depnode_t *source,
+ kmp_depnode_t *sink) {
+ return __kmp_taskgraph_add_dep(thread, source, sink);
+}
+
+kmp_int32 taskgraph_deps::link_successor(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_task_t *task, kmp_depnode_t *node,
+ kmp_depnode_list_t *plist) {
+ return __kmp_taskgraph_add_dep(thread, node, plist);
+}
+
+void taskgraph_deps::mutex_dep(kmp_info_t *thread, kmp_dephash_entry_t *info,
+ kmp_depnode_t *node, kmp_int32 &next_mutex_set) {
+ if (info->set_num == -1) {
+ info->set_num = next_mutex_set++;
+ }
+ if (!node->dn.set_membership) {
+ node->dn.set_membership = __kmp_bitset_alloc(thread, 64);
+ }
+ __kmp_bitset_set(node->dn.set_membership, info->set_num);
+}
+
+/// Search for aliased (same base address) dependencies in \c dep_list, and
+/// nullify duplicates. Return TRUE if we have an 'all' dependency, FALSE
+/// otherwise. Return number of mutex dependencies in *N_MTXS.
+static bool __kmp_filter_aliased_deps(kmp_int32 ndeps,
+ kmp_depend_info_t *dep_list,
+ kmp_task_t *task, int *n_mtxs) {
+ *n_mtxs = 0;
// Filter deps in dep_list
// TODO: Different algorithm for large dep_list ( > 10 ? )
- for (i = 0; i < ndeps; i++) {
+ for (int i = 0; i < ndeps; i++) {
if (dep_list[i].base_addr != 0 &&
dep_list[i].base_addr != (kmp_intptr_t)KMP_SIZE_T_MAX) {
KMP_DEBUG_ASSERT(
@@ -617,8 +846,8 @@ static bool __kmp_check_deps(kmp_int32 gtid, kmp_depnode_t *node,
}
if (dep_list[i].flag == KMP_DEP_MTX) {
// limit number of mtx deps to MAX_MTX_DEPS per node
- if (n_mtxs < MAX_MTX_DEPS && task != NULL) {
- ++n_mtxs;
+ if (*n_mtxs < MAX_MTX_DEPS && task != NULL) {
+ ++(*n_mtxs);
} else {
dep_list[i].flag = KMP_DEP_OUT; // downgrade mutexinoutset to inout
}
@@ -628,118 +857,2605 @@ static bool __kmp_check_deps(kmp_int32 gtid, kmp_depnode_t *node,
// omp_all_memory dependence can be marked by compiler by either
// (addr=0 && flag=0x80) (flag KMP_DEP_ALL), or (addr=-1).
// omp_all_memory overrides all other dependences if any
- dep_all = 1;
+ return true;
+ }
+ }
+ return false;
+}
+
+// Round up a size to a power of two specified by val: Used to insert padding
+// between structures co-allocated using a single malloc() call
+// FIXME: We copy+pasted this, put it somewhere else instead.
+static size_t __kmp_round_up_to_val(size_t size, size_t val) {
+ if (size & (val - 1)) {
+ size &= ~(val - 1);
+ if (size <= KMP_SIZE_T_MAX - val) {
+ size += val; // Round up if there is no overflow.
+ }
+ }
+ return size;
+} // __kmp_round_up_to_val
+
+// FIXME: C++-ify this.
+static kmp_taskgraph_region_t *
+__kmp_taskgraph_region_alloc(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_node_t *node,
+ kmp_taskgraph_region_t *parent) {
+ kmp_taskgraph_region_t *region =
+ (kmp_taskgraph_region_t *)__kmp_fast_allocate(thread,
+ sizeof(kmp_taskgraph_region_t));
+ region->owner = taskgraph;
+ region->type = node ? TASKGRAPH_REGION_NODE : TASKGRAPH_REGION_WAIT;
+ region->task.node = node;
+ region->task.next_instance = region;
+ region->mark = TASKGRAPH_UNMARKED;
+ region->level = -1;
+ region->timestamp = 0;
+ region->next = nullptr;
+ region->parent = parent;
+ region->predecessors = nullptr;
+ region->successors = nullptr;
+ region->mutexset = nullptr;
+ region->mutexset_parent = nullptr;
+ *alloc_chain = region;
+ alloc_chain = ®ion->alloc_chain;
+ return region;
+}
+
+// FIXME: This too.
+static kmp_taskgraph_region_t *
+__kmp_taskgraph_region_alloc(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ enum kmp_taskgraph_region_type type,
+ kmp_int32 num_nodes,
+ kmp_taskgraph_region_t *parent) {
+ kmp_size_t size =
+ sizeof(kmp_taskgraph_region_t) +
+ num_nodes * sizeof(kmp_taskgraph_region_t *);
+ size = __kmp_round_up_to_val(size, sizeof(kmp_taskgraph_region_t *));
+ kmp_taskgraph_region_t *region =
+ (kmp_taskgraph_region_t *)__kmp_fast_allocate(thread, size);
+ region->owner = taskgraph;
+ region->type = type;
+ region->inner.children = (kmp_taskgraph_region**)®ion[1];
+ region->inner.num_children = num_nodes;
+ region->mark = TASKGRAPH_UNMARKED;
+ region->level = -1;
+ region->timestamp = 0;
+ region->next = nullptr;
+ region->parent = parent;
+ region->predecessors = nullptr;
+ region->successors = nullptr;
+ region->mutexset = nullptr;
+ region->mutexset_parent = nullptr;
+ region->reduce_input = nullptr;
+ *alloc_chain = region;
+ alloc_chain = ®ion->alloc_chain;
+ return region;
+}
+
+// This makes a mostly-deep copy of a region. The region itself and children nodes are
+// created new, but node pointers are shared.
+static kmp_taskgraph_region_t *
+__kmp_taskgraph_region_clone(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_region_t *from,
+ kmp_taskgraph_region_t *parent,
+ kmp_int32 indent = 0) {
+ kmp_taskgraph_region_t *clone = nullptr;
+ switch (from->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ clone = __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain,
+ nullptr, parent);
+ clone->type = from->type;
+ break;
+ case TASKGRAPH_REGION_NODE:
+ case TASKGRAPH_REGION_WAIT:
+ clone = __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain,
+ from->task.node, parent);
break;
+ default: {
+ clone = __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain,
+ from->type, from->inner.num_children,
+ parent);
+ for (kmp_int32 n = 0; n < from->inner.num_children; n++) {
+ clone->inner.children[n] =
+ __kmp_taskgraph_region_clone(thread, taskgraph, alloc_chain,
+ from->inner.children[n], clone,
+ indent + 2);
+ }
}
}
+ TGDBG("%*scloned region %p from region %p\n", indent, "", clone, from);
+ return clone;
+}
- // doesn't need to be atomic as no other thread is going to be accessing this
- // node just yet.
- // npredecessors is set -1 to ensure that none of the releasing tasks queues
- // this task before we have finished processing all the dependences
- node->dn.npredecessors = -1;
+static kmp_int32
+__kmp_taskgraph_topological_order(kmp_taskgraph_region_t *region,
+ kmp_taskgraph_region_t **order_out,
+ kmp_int32 *outidx) {
+ if (region->mark == TASKGRAPH_PERMANENT_MARK)
+ return region->level;
- // used to pack all npredecessors additions into a single atomic operation at
- // the end
- int npredecessors;
+ assert(region->mark != TASKGRAPH_TEMP_MARK);
- if (!dep_all) { // regular dependences
- npredecessors = __kmp_process_deps<true>(gtid, node, hash, dep_barrier,
- ndeps, dep_list, task);
- npredecessors += __kmp_process_deps<false>(
- gtid, node, hash, dep_barrier, ndeps_noalias, noalias_dep_list, task);
- } else { // omp_all_memory dependence
- npredecessors = __kmp_process_dep_all(gtid, node, *hash, dep_barrier, task);
+ region->mark = TASKGRAPH_TEMP_MARK;
+
+ kmp_int32 max_level = -1;
+ for (kmp_taskgraph_region_dep_t *s = region->predecessors;
+ s;
+ s = s->next) {
+ kmp_int32 pred_level =
+ __kmp_taskgraph_topological_order(s->region, order_out, outidx);
+ max_level = pred_level > max_level ? pred_level : max_level;
}
- node->dn.task = task;
- KMP_MB();
+ region->level = max_level + 1;
+ region->mark = TASKGRAPH_PERMANENT_MARK;
+ order_out[(*outidx)++] = region;
- // Account for our initial fake value
- npredecessors++;
+ return region->level;
+}
- // Update predecessors and obtain current value to check if there are still
- // any outstanding dependences (some tasks may have finished while we
- // processed the dependences)
- npredecessors =
- node->dn.npredecessors.fetch_add(npredecessors) + npredecessors;
+static void
+__kmp_taskgraph_region_chain_clear_marks(kmp_taskgraph_region_t *region) {
+ for (; region; region = region->next)
+ region->mark = TASKGRAPH_UNMARKED;
+}
- KA_TRACE(20, ("__kmp_check_deps: T#%d found %d predecessors for task %p \n",
- gtid, npredecessors, taskdata));
+static void
+__kmp_taskgraph_region_chain_prune(kmp_taskgraph_region_t **region_p) {
+ kmp_taskgraph_region_t *pruned_region = nullptr, *region = *region_p;
+ kmp_taskgraph_region_t **pruned_region_p = &pruned_region;
+
+ TGDBG("pruning worklist...\n");
+
+ // NOTE: Pruning and deletion look the same here with respect to the handling
+ // of the worklist, but deleted nodes are freed from the taskgraph structure
+ // during cleanup, whereas combined nodes are retained.
+ for (; region; region = region->next) {
+ if (region->mark == TASKGRAPH_COMBINED || region->mark == TASKGRAPH_DELETED)
+ *pruned_region_p = region->next;
+ else {
+ *pruned_region_p = region;
+ pruned_region_p = ®ion->next;
+ }
+ }
- // beyond this point the task could be queued (and executed) by a releasing
- // task...
- return npredecessors > 0 ? true : false;
+ *pruned_region_p = nullptr;
+ *region_p = pruned_region;
}
-/*!
- at ingroup TASKING
- at param loc_ref location of the original task directive
- at param gtid Global Thread ID of encountering thread
- at param new_task task thunk allocated by __kmp_omp_task_alloc() for the ''new
-task''
- at param ndeps Number of depend items with possible aliasing
- at param dep_list List of depend items with possible aliasing
- at param ndeps_noalias Number of depend items with no aliasing
- at param noalias_dep_list List of depend items with no aliasing
+static kmp_int32 __kmp_region_deplist_len(kmp_taskgraph_region_dep_t *list) {
+ kmp_int32 len = 0;
+ for (; list; list = list->next)
+ ++len;
+ return len;
+}
- at return Returns either TASK_CURRENT_NOT_QUEUED if the current task was not
-suspended and queued, or TASK_CURRENT_QUEUED if it was suspended and queued
+static void
+__kmp_region_deplist_free(kmp_info_t *thread,
+ kmp_taskgraph_region_dep_t *list) {
+ while (list) {
+ kmp_taskgraph_region_dep_t *next = list->next;
+ __kmp_fast_free(thread, list);
+ list = next;
+ }
+}
-Schedule a non-thread-switchable task with dependences for execution
-*/
-kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,
- kmp_task_t *new_task, kmp_int32 ndeps,
- kmp_depend_info_t *dep_list,
- kmp_int32 ndeps_noalias,
- kmp_depend_info_t *noalias_dep_list) {
+static void __kmp_region_dep_recycle(kmp_taskgraph_region_dep_t **recycled,
+ kmp_taskgraph_region_dep_t *dep) {
+ dep->next = *recycled;
+ *recycled = dep;
+}
- kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
- KA_TRACE(10, ("__kmpc_omp_task_with_deps(enter): T#%d loc=%p task=%p\n", gtid,
- loc_ref, new_taskdata));
- __kmp_assert_valid_gtid(gtid);
- kmp_info_t *thread = __kmp_threads[gtid];
- kmp_taskdata_t *current_task = thread->th.th_current_task;
+static void __kmp_region_deplist_recycle(kmp_taskgraph_region_dep_t **recycled,
+ kmp_taskgraph_region_dep_t *list) {
+ while (list) {
+ kmp_taskgraph_region_dep_t *next = list->next;
+ __kmp_region_dep_recycle(recycled, list);
+ list = next;
+ }
+}
-#if OMP_TASKGRAPH_EXPERIMENTAL
- // record TDG with deps
- if (new_taskdata->is_taskgraph &&
- __kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
- kmp_tdg_info_t *tdg = new_taskdata->tdg;
- // extend record_map if needed
- __kmp_acquire_bootstrap_lock(&tdg->graph_lock);
- if (new_taskdata->td_tdg_task_id >= tdg->map_size) {
- kmp_uint old_size = tdg->map_size;
- kmp_uint new_size = old_size * 2;
- kmp_node_info_t *old_record = tdg->record_map;
- kmp_node_info_t *new_record =
- (kmp_node_info_t *)__kmp_allocate(new_size * sizeof(kmp_node_info_t));
- KMP_MEMCPY(new_record, tdg->record_map,
- old_size * sizeof(kmp_node_info_t));
- tdg->record_map = new_record;
-
- __kmp_free(old_record);
-
- for (kmp_int i = old_size; i < new_size; i++) {
- new_record[i].task = nullptr;
- new_record[i].parent_task = nullptr;
- new_record[i].successors = nullptr;
- new_record[i].nsuccessors = 0;
- new_record[i].npredecessors = 0;
- new_record[i].successors_size = 0;
- KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
- }
- // update the size at the end, so that we avoid other
- // threads use old_record while map_size is already updated
- tdg->map_size = new_size;
- }
- tdg->record_map[new_taskdata->td_tdg_task_id].task = new_task;
- tdg->record_map[new_taskdata->td_tdg_task_id].parent_task =
- new_taskdata->td_parent;
- KMP_ATOMIC_INC(&tdg->num_tasks);
- __kmp_release_bootstrap_lock(&tdg->graph_lock);
+static bool
+__kmp_taskgraph_collapse_sequence(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_region_t **region_p,
+ kmp_taskgraph_region_t *parent,
+ kmp_int32 &stamp) {
+ kmp_taskgraph_region_t *region = *region_p;
+ kmp_taskgraph_region_t *chain_start = region;
+ kmp_taskgraph_region_t *chain_end = region;
+ kmp_int32 chain_len = 1;
+
+ if (region->type == TASKGRAPH_REGION_ENTRY)
+ return false;
+
+ while (__kmp_region_deplist_len(chain_end->successors) == 1) {
+ kmp_taskgraph_region_t *past_end = chain_end->successors->region;
+ if (__kmp_region_deplist_len(past_end->predecessors) == 1) {
+ if (past_end->type == TASKGRAPH_REGION_EXIT)
+ break;
+ else {
+ chain_end = past_end;
+ ++chain_len;
+ }
+ } else
+ break;
}
-#endif
+
+ if (chain_len <= 1)
+ return false;
+
+ kmp_taskgraph_region_t *seq_region =
+ __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain,
+ TASKGRAPH_REGION_SEQUENTIAL, chain_len,
+ parent);
+ TGDBG("allocated new seq region: %p (length %d)\n", seq_region, chain_len);
+ kmp_taskgraph_region_t **worklist_p = region_p;
+ *worklist_p = seq_region;
+ seq_region->next = chain_start->next;
+ kmp_int32 level = -1;
+ for (kmp_int32 i = 0; i < chain_len; i++) {
+ seq_region->inner.children[i] = chain_start;
+ TGDBG("mark node %p as combined\n", chain_start);
+ chain_start->mark = TASKGRAPH_COMBINED;
+ chain_start->timestamp = stamp;
+ chain_start->parent = seq_region;
+ // The level of the sequence is the level of the first node.
+ if (level == -1)
+ level = chain_start->level;
+
+ if (i < chain_len - 1) {
+ chain_start = chain_start->successors->region;
+ }
+ }
+
+ seq_region->level = level;
+ seq_region->predecessors = seq_region->inner.children[0]->predecessors;
+ seq_region->successors =
+ seq_region->inner.children[chain_len - 1]->successors;
+ seq_region->inner.children[0]->predecessors = nullptr;
+ seq_region->inner.children[chain_len - 1]->successors = nullptr;
+
+ // Update predecessors to point to new seq region.
+ for (kmp_taskgraph_region_dep_t *pred = seq_region->predecessors; pred;
+ pred = pred->next) {
+ for (kmp_taskgraph_region_dep_t *succ = pred->region->successors; succ;
+ succ = succ->next) {
+ if (succ->region == seq_region->inner.children[0]) {
+ succ->region = seq_region;
+ }
+ }
+ }
+
+ // Update successors to point back to new seq region.
+ for (kmp_taskgraph_region_dep_t *succ = seq_region->successors; succ;
+ succ = succ->next) {
+ for (kmp_taskgraph_region_dep_t *pred = succ->region->predecessors; pred;
+ pred = pred->next) {
+ if (pred->region == seq_region->inner.children[chain_len - 1]) {
+ pred->region = seq_region;
+ }
+ }
+ }
+
+ return true;
+}
+
+static const char*
+__kmp_taskgraph_region_type_name(kmp_taskgraph_region_type type);
+
+static void
+__kmp_taskgraph_region_dfs(kmp_taskgraph_region_t *region,
+ kmp_taskgraph_region_t **order,
+ kmp_int32 &idx, bool use_preds) {
+ if (order) {
+ region->timestamp = --idx;
+ order[idx] = region;
+ }
+ region->mark = TASKGRAPH_TEMP_MARK;
+ for (kmp_taskgraph_region_dep_t *reg = use_preds ? region->predecessors
+ : region->successors; reg;
+ reg = reg->next) {
+ if (reg->region->mark == TASKGRAPH_UNMARKED)
+ __kmp_taskgraph_region_dfs(reg->region, order, idx, use_preds);
+ }
+}
+
+#if defined(DEBUG_TASKGRAPH) && defined(CHECK_WORKLIST)
+
+static void
+__kmp_taskgraph_region_gather_deps(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region,
+ kmp_taskgraph_region_dep_t **deplist,
+ bool &ok) {
+ for (kmp_taskgraph_region_dep_t *dep = *deplist; dep; dep = dep->next) {
+ if (dep->region == region)
+ return;
+ }
+
+ *deplist = __kmp_region_deplist_add(thread, &taskgraph->recycled_deps, region,
+ *deplist);
+
+ for (kmp_taskgraph_region_dep_t *pred = region->predecessors; pred;
+ pred = pred->next) {
+ if (pred->region->mark == TASKGRAPH_DELETED) {
+ fprintf(stderr, "*** Region %p's predecessor %p is a deleted node\n",
+ region, pred->region);
+ ok = false;
+ }
+ __kmp_taskgraph_region_gather_deps(thread, taskgraph, pred->region,
+ deplist, ok);
+ }
+
+ for (kmp_taskgraph_region_dep_t *succ = region->successors; succ;
+ succ = succ->next) {
+ if (succ->region->mark == TASKGRAPH_DELETED) {
+ fprintf(stderr, "*** Region %p's successor %p is a deleted node\n",
+ region, succ->region);
+ ok = false;
+ }
+ __kmp_taskgraph_region_gather_deps(thread, taskgraph, succ->region,
+ deplist, ok);
+ }
+}
+
+static bool
+__kmp_taskgraph_region_worklist_check(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region,
+ const char *where) {
+ kmp_taskgraph_region_dep_t *collected_nodes = nullptr;
+ bool ok = true;
+ __kmp_taskgraph_region_gather_deps(thread, taskgraph, region,
+ &collected_nodes, ok);
+
+ // Check all collected nodes are in the region's worklist.
+ for (kmp_taskgraph_region_dep_t *cn = collected_nodes; cn; cn = cn->next) {
+ bool in_list = false;
+ for (kmp_taskgraph_region_t *r = region; r; r = r->next) {
+ if (r == cn->region) {
+ in_list = true;
+ break;
+ }
+ }
+ if (!in_list) {
+ fprintf(stderr,
+ "*** Region %p is in dependency graph but not worklist (%s)\n",
+ cn->region, where);
+ ok = false;
+ }
+ }
+
+ for (kmp_taskgraph_region_t *r = region; r; r = r->next) {
+ bool in_list = false;
+ for (kmp_taskgraph_region_dep_t *cn = collected_nodes; cn; cn = cn->next) {
+ if (r == cn->region) {
+ in_list = true;
+ break;
+ }
+ }
+ if (!in_list) {
+ fprintf(stderr,
+ "*** Region %p is in worklist but not dependency graph (%s)\n",
+ r, where);
+ ok = false;
+ }
+ }
+
+ __kmp_region_deplist_recycle(&taskgraph->recycled_deps, collected_nodes);
+
+ return ok;
+}
+#else
+static bool
+__kmp_taskgraph_region_worklist_check(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region,
+ const char *where) {
+ return true;
+}
+#endif
+
+static kmp_taskgraph_region_t *
+__kmp_taskgraph_region_dom_intersect(kmp_taskgraph_region_t **order,
+ kmp_taskgraph_region_t **doms,
+ kmp_taskgraph_region_t *b1,
+ kmp_taskgraph_region_t *b2) {
+ kmp_int32 finger1 = b1->timestamp;
+ kmp_int32 finger2 = b2->timestamp;
+ while (finger1 != finger2) {
+ while (finger1 < finger2)
+ finger1 = doms[finger1]->timestamp;
+ while (finger2 < finger1)
+ finger2 = doms[finger2]->timestamp;
+ }
+ return order[finger1];
+}
+
+static void
+__kmp_taskgraph_region_doms(kmp_taskgraph_region_t **order,
+ kmp_taskgraph_region_t **doms,
+ kmp_int32 worklist_length, bool postdom) {
+ bool changed = true;
+ // Set doms[start_node] <- start_node
+ doms[worklist_length - 1] = order[worklist_length - 1];
+ order[worklist_length - 1]->mark = TASKGRAPH_PERMANENT_MARK;
+ while (changed) {
+ changed = false;
+ for (int n = 0; n < worklist_length - 1; n++) {
+ kmp_taskgraph_region_t *b = order[n];
+ kmp_taskgraph_region_t *new_idom = nullptr;
+ for (kmp_taskgraph_region_dep_t *pred = postdom ? b->successors
+ : b->predecessors; pred;
+ pred = pred->next) {
+ if (pred->region->mark == TASKGRAPH_PERMANENT_MARK) {
+ new_idom = pred->region;
+ break;
+ }
+ }
+ for (kmp_taskgraph_region_dep_t *pred = postdom ? b->successors
+ : b->predecessors; pred;
+ pred = pred->next) {
+ if (pred->region == new_idom)
+ continue;
+ if (doms[pred->region->timestamp]) {
+ new_idom =
+ __kmp_taskgraph_region_dom_intersect(order, doms, pred->region,
+ new_idom);
+ }
+ }
+ if (doms[b->timestamp] != new_idom) {
+ doms[b->timestamp] = new_idom;
+ order[b->timestamp]->mark = TASKGRAPH_PERMANENT_MARK;
+ changed = true;
+ }
+ }
+ }
+}
+
+static bool
+__kmp_taskgraph_region_mutex_p(kmp_taskgraph_region_t *reg) {
+ if (reg->type == TASKGRAPH_REGION_NODE)
+ return reg->mutexset != nullptr;
+ return false;
+}
+
+// This function collapses graph regions with forms like this:
+//
+// 1. A(pp) 2. A 3. A(pp)
+// / \ / \ / \
+// B C B(pp) E(pp) B(pp) E
+// \ / / \ / \ / \ /
+// D(*) C D F G C D /
+// \ \ / / \ | /
+// `---H(*)--' F(*)
+//
+// We look for a node with more than one predecessor (*), where each of those
+// predecessors has a single successor and a single predecessor (pp). We group
+// nodes by which pp (predecessor-predecessor) they have: for (1), nodes B & C
+// share a pp; for (2), C & D share a pp, and F & G share a pp; for (3), C & D
+// share a pp, and E has a separate pp.
+//
+// We choose the pp the the highest level ("furthest down the graph"), and
+// collapse the subgraph into a parallel region.
+
+static bool
+__kmp_taskgraph_collapse_par_exclusive(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_region_t **region_p,
+ kmp_taskgraph_region_t *parent,
+ kmp_int32 &stamp) {
+ kmp_taskgraph_region_t *region = *region_p;
+ kmp_int32 num_predecessors = __kmp_region_deplist_len(region->predecessors);
+
+ TGDBG("predecessors %d, successors %d\n",
+ __kmp_region_deplist_len(region->predecessors),
+ __kmp_region_deplist_len(region->successors));
+
+ if (num_predecessors <= 1)
+ return false;
+
+ TGDBG("found multiple predecessors, creating parallel/unordered region\n");
+ kmp_taskgraph_region_dep_t *pred_preds = nullptr;
+ kmp_int32 highest_level = -1;
+
+ for (kmp_taskgraph_region_dep_t *pred = region->predecessors; pred;
+ pred = pred->next) {
+ TGDBG("consider predecessor: %p\n", pred->region);
+ TGDBG("-- successors %d, predecessors %d\n",
+ __kmp_region_deplist_len(pred->region->successors),
+ __kmp_region_deplist_len(pred->region->predecessors));
+ if (highest_level == -1 || pred->region->level > highest_level)
+ highest_level = pred->region->level;
+ kmp_taskgraph_region_t *pred_region = pred->region;
+ if (__kmp_region_deplist_len(pred_region->successors) != 1)
+ continue;
+ if (__kmp_region_deplist_len(pred_region->predecessors) != 1)
+ continue;
+ bool in_list = false;
+ TGDBG("pp region: %p (%s)\n", pred_region->predecessors->region,
+ __kmp_taskgraph_region_type_name(pred_region->predecessors->region->type));
+ kmp_taskgraph_region_t *pp_region = pred_region->predecessors->region;
+ for (kmp_taskgraph_region_dep_t *pp = pred_preds; pp; pp = pp->next) {
+ if (pp->region == pp_region) {
+ in_list = true;
+ break;
+ }
+ }
+ if (!in_list) {
+ pred_preds = __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ pp_region, pred_preds);
+ TGDBG("add %p to list: len(pred_preds)=%d\n", pp_region,
+ __kmp_region_deplist_len(pred_preds));
+ }
+ }
+
+ kmp_int32 num_pps = __kmp_region_deplist_len(pred_preds);
+ if (num_pps == 0) {
+ TGDBG("no collapsible regions, bailing out\n");
+ return false;
+ }
+ TGDBG("found %d predecessor-predecessors\n", num_pps);
+ TGDBG("highest pred level: %d\n", highest_level);
+
+ kmp_int32 pp_idx = 0;
+
+ bool changed = false;
+
+ for (kmp_taskgraph_region_dep_t *pp = pred_preds; pp; pp = pp->next) {
+ kmp_taskgraph_region_dep_t *par_succs = nullptr;
+ kmp_taskgraph_region_dep_t *par_preds = nullptr;
+ kmp_int32 preds_for_pp = 0;
+ bool any_mutex_p = false;
+ for (kmp_taskgraph_region_dep_t *pred = region->predecessors; pred;
+ pred = pred->next) {
+ kmp_taskgraph_region_t *pred_region = pred->region;
+ if (!pred_region->predecessors)
+ continue;
+ if (pred_region->level < highest_level)
+ continue;
+ if (__kmp_region_deplist_len(pred_region->predecessors) != 1 ||
+ __kmp_region_deplist_len(pred_region->successors) != 1)
+ continue;
+ TGDBG("counting pred region: %p (%s)\n", pred_region,
+ __kmp_taskgraph_region_type_name(pred_region->type));
+ if (pred_region->predecessors->region == pp->region) {
+ ++preds_for_pp;
+ if (__kmp_taskgraph_region_mutex_p(pred_region))
+ any_mutex_p = true;
+ }
+ }
+ TGDBG("found %d preds for pp region %p\n", preds_for_pp, pp->region);
+ if (preds_for_pp < 2)
+ continue;
+ kmp_taskgraph_region_type region_type =
+ any_mutex_p ? TASKGRAPH_REGION_EXCLUSIVE : TASKGRAPH_REGION_PARALLEL;
+ kmp_taskgraph_region_t *par_region =
+ __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain, region_type,
+ preds_for_pp, parent);
+ changed = true;
+ TGDBG("allocated %s region: %p\n",
+ region_type == TASKGRAPH_REGION_EXCLUSIVE ? "exclusive" : "parallel",
+ par_region);
+ kmp_taskgraph_region_dep_t *pred = region->predecessors;
+ kmp_int32 level = -1;
+ bool found_reduction_data = false;
+ for (kmp_int32 i = 0; pred; pred = pred->next) {
+ kmp_taskgraph_region_t *pred_region = pred->region;
+ TGDBG("considering pred region: %p\n", pred_region);
+ if (!pred_region->predecessors) {
+ TGDBG("bailing (no predecessors)\n");
+ continue;
+ }
+ if (pred_region->predecessors->region != pp->region) {
+ TGDBG("bailing (wrong pp region)\n");
+ continue;
+ }
+ if (__kmp_region_deplist_len(pred_region->predecessors) != 1 ||
+ __kmp_region_deplist_len(pred_region->successors) != 1) {
+ TGDBG("bailing (non-unit pred/succ list length)\n");
+ continue;
+ }
+ TGDBG("process region %p (%d/%d), level %d\n", pred->region, i+1,
+ preds_for_pp, pred_region->level);
+ par_region->inner.children[i] = pred_region;
+ pred_region->mark = TASKGRAPH_COMBINED;
+ pred_region->timestamp = stamp;
+ pred_region->parent = par_region;
+
+ // Reduction handling. The reduction input data is now attached to one
+ // of the tasks participating in the reduction. Move it to the enclosing
+ // parallel region instead.
+ if (pred_region->type == TASKGRAPH_REGION_NODE &&
+ pred_region->task.node->reduce_input) {
+ // We should only be doing this once per par region.
+ assert(!par_region->reduce_input);
+ par_region->reduce_input = pred_region->task.node->reduce_input;
+ pred_region->task.node->reduce_input = nullptr;
+ found_reduction_data = true;
+ }
+
+ // We expect all the predecessor regions to be at the same level.
+ if (level == -1)
+ level = pred_region->level;
+ else
+ assert(level == pred_region->level);
+ if (!par_succs) {
+ // Copy one list of predecessors/successors for the predecessor region.
+ // We know these are of length one by checks above. We'll re-use them
+ // for the created parallel region.
+ par_preds = pred_region->predecessors;
+ par_succs = pred_region->successors;
+ pred_region->predecessors = nullptr;
+ pred_region->successors = nullptr;
+ }
+ i++;
+ }
+ par_region->level = level;
+ par_region->predecessors = par_preds;
+ par_region->successors = par_succs;
+
+ if (region->type == TASKGRAPH_REGION_WAIT &&
+ !found_reduction_data) {
+ // If we have no reduction data, we will not create a taskgroup for this
+ // parallel region at replay time, so we don't need to terminate/discard
+ // that region when we're done. Clear the taskloop_task flag.
+ region->task.node->taskloop_task = false;
+ }
+
+ // Add the new parallel region to the worklist. FIXME: We're reprocessing
+ // the 'region' node here -- we don't need to do that if it's fully
+ // consumed.)
+ par_region->next = region->next;
+ region->next = par_region;
+ }
+
+#ifdef DEBUG_TASKGRAPH
+ TGDBG("before pred fixup:\n");
+ for (kmp_taskgraph_region_dep_t *pred = region->predecessors; pred;
+ pred = pred->next) {
+ TGDBG("region %p, pred region: %p\n", region, pred->region);
+ }
+#endif
+
+ // Now, fix up predecessor list for 'region', and successor lists for each
+ // predecessor-predecessor.
+ kmp_taskgraph_region_dep_t **dep_p = ®ion->predecessors;
+ while (*dep_p) {
+ kmp_taskgraph_region_dep_t *dep = *dep_p;
+ if (dep->region->mark == TASKGRAPH_COMBINED) {
+ if (!dep->region->successors) {
+ dep->region = dep->region->parent;
+ dep_p = &dep->next;
+ } else {
+ kmp_taskgraph_region_dep_t *next = dep->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, dep);
+ *dep_p = next;
+ }
+ } else {
+ dep_p = &dep->next;
+ }
+ }
+
+#ifdef DEBUG_TASKGRAPH
+ TGDBG("after pred fixup:\n");
+ for (kmp_taskgraph_region_dep_t *pred = region->predecessors; pred;
+ pred = pred->next) {
+ TGDBG("region %p, pred region: %p\n", region, pred->region);
+ }
+#endif
+
+ for (kmp_taskgraph_region_dep_t *pp = pred_preds; pp; pp = pp->next) {
+ kmp_taskgraph_region_t *pp_region = pp->region;
+ dep_p = &pp_region->successors;
+ while (*dep_p) {
+ kmp_taskgraph_region_dep_t *dep = *dep_p;
+ if (dep->region->mark == TASKGRAPH_COMBINED) {
+ if (!dep->region->predecessors) {
+ dep->region = dep->region->parent;
+ dep_p = &dep->next;
+ } else {
+ kmp_taskgraph_region_dep_t *next = dep->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, dep);
+ *dep_p = next;
+ }
+ } else {
+ dep_p = &dep->next;
+ }
+ }
+ }
+
+ return changed;
+}
+
+static void
+__kmp_taskgraph_region_dot(kmp_taskgraph_region_t *region, const char *name) {
+ fprintf(stderr, "digraph %s {\n", name);
+ for (kmp_taskgraph_region_t *r = region; r; r = r->next) {
+ if (r->mark == TASKGRAPH_DELETED) {
+ fprintf(stderr, "\"%p\" [shape=box, label=\"%p(%s) (deleted)\"]\n", r, r,
+ __kmp_taskgraph_region_type_name(r->type));
+ } else if (r->level == -1) {
+ fprintf(stderr, "\"%p\" [shape=box, label=\"%p(%s) (new)\"]\n", r, r,
+ __kmp_taskgraph_region_type_name(r->type));
+ } else {
+ fprintf(stderr, "\"%p\" [shape=box, label=\"%p(%s)\"]\n", r, r,
+ __kmp_taskgraph_region_type_name(r->type));
+ }
+ for (kmp_taskgraph_region_dep_t *succ = r->successors; succ;
+ succ = succ->next) {
+ fprintf(stderr, " \"%p\" -> \"%p\" [color=green]\n", r, succ->region);
+ }
+ for (kmp_taskgraph_region_dep_t *pred = r->predecessors; pred;
+ pred = pred->next) {
+ fprintf(stderr, " \"%p\" -> \"%p\" [color=red, constraint=false]\n", r,
+ pred->region);
+ }
+ }
+ fprintf(stderr, "}\n");
+}
+
+static kmp_int32
+__kmp_taskgraph_count_edges_to_dominator(kmp_taskgraph_region_t *reg,
+ kmp_taskgraph_region_t *dom) {
+ kmp_int32 count = __kmp_region_deplist_len(reg->successors) - 1;
+
+ for (kmp_taskgraph_region_dep_t *pred = reg->predecessors; pred;
+ pred = pred->next) {
+ if (pred->region == dom)
+ count++;
+ else
+ count += __kmp_taskgraph_count_edges_to_dominator(pred->region, dom) + 1;
+ }
+ count--;
+
+ return count;
+}
+
+/// Extract/clone a subgraph of the dependency graph, and rewrite predecessor
+/// and successor edges to point to the new cloned part.
+//
+// The function conceptually starts at the bottom (a list of predecessors
+// with some particular dominator) and works up towards the entry point,
+// stopping when it hits the aforementioned dominator.
+//
+// Say we have an irreducible graph like this (each letter represents a region,
+// which could be a single task node or an already-processed nested region):
+//
+// <S> (S->A, S->B)
+// _/ \_
+// / \
+// A B (A->C, A->D, B->F, B->G)
+// / \ / \
+// C D F G
+// |\ \/ /|
+// | \ /\ / | (C->H, C->I, D->I, F->H, G->H, G->J)
+// | \ / __|__/ |
+// \ /\_/_ / |
+// H__/ I J
+// \__ | ___/ (H->E, I->E, J->E)
+// \ | /
+// <E>
+//
+// We pick the exit node E which has more than one predecessor: H, I and J.
+// In this case, H is immediately dominated by the start node, S.
+// The 'preds_with_dom' list initially contains the node H.
+// We clone the region H then call ourselves with its cloned predecessors,
+// until we hit the dominator 'region_dom'. After rewriting the original
+// subgraph's (entering) predecessors and (leaving) successors, we obtain a
+// graph like this:
+//
+// __ <S>__ (S->A', S->B', S->A, S->B)
+// _/ / \ \___
+// / / \ \
+// A' B' A B (A'->C', B'->F', B'->G', A->C, A->D, B->F, B->G)
+// / / \ / \ / \
+// C' F' G' C D F* G (C'->H', F'->H', G'->H', C->I, D->I, G->J)
+// \_ | _/ \ / /
+// H' I J (H'->E, I->E, J->E)
+// \ | /
+// \___ / __/
+// <E>
+//
+// The new cloned subgraph formed from nodes H', C', F', G', A', B' replaces
+// the original predecessor of E, H. Some nodes are now unreachable (F, marked
+// with *), and can be deleted. The start node S now has successors A, B, and
+// the new clones A' and B'.
+//
+// In this way, irreducible graphs are turned into reducible graphs. A
+// critical point is what it means to clone a task node in this way: that is
+// discussed in the commentary of __kmp_taskgraph_rewrite_irreducible.
+
+static void
+__kmp_taskgraph_clone_subgraph(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_region_t *cloned_nodes[],
+ kmp_taskgraph_region_t *orig_region,
+ kmp_taskgraph_region_t *doms[],
+ kmp_taskgraph_region_dep_t *preds_with_dom,
+ kmp_taskgraph_region_t *region_dom,
+ kmp_taskgraph_region_t ***added_worklist) {
+ for (kmp_taskgraph_region_dep_t *pred = preds_with_dom; pred;
+ pred = pred->next) {
+ kmp_taskgraph_region_t *pred_region = pred->region;
+ if (pred_region == region_dom) {
+ // NOTE: Adding the new subgraph entry point as a new successor for the
+ // dominating block is done in the successor-adding post-pass.
+ pred->region = region_dom;
+ } else {
+ // If we've already processed this predecessor, move on.
+ if (cloned_nodes[pred_region->timestamp]) {
+ pred->region = cloned_nodes[pred_region->timestamp];
+ continue;
+ }
+ kmp_taskgraph_region_t *cloned_region =
+ __kmp_taskgraph_region_clone(thread, taskgraph, alloc_chain,
+ pred_region, nullptr);
+ cloned_nodes[pred_region->timestamp] = cloned_region;
+
+ **added_worklist = cloned_region;
+ *added_worklist = &cloned_region->next;
+
+ pred->region = cloned_region;
+ // Now make a copy of the predecessor list and call ourselves recursively.
+ kmp_taskgraph_region_dep_t *cloned_preds = nullptr;
+ for (kmp_taskgraph_region_dep_t *p = pred_region->predecessors; p;
+ p = p->next) {
+ cloned_preds =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ p->region, cloned_preds);
+ }
+ cloned_region->predecessors = cloned_preds;
+ // Note pred_region is the original predecessor region here, not the
+ // newly-cloned one.
+ __kmp_taskgraph_clone_subgraph(thread, taskgraph, alloc_chain,
+ cloned_nodes, pred_region, doms,
+ cloned_preds, region_dom, added_worklist);
+ }
+ }
+}
+
+/// This function uses several strategies to turn an irreducible taskgraph
+/// into a reducible taskgraph.
+//
+// 1. If a node C depends on node B and also node A which dominates C,
+// and if B is also dominated by C, then the dependency of C on A can be
+// dropped. That is, we know B must execute after A, so we can say
+// execution must proceed A->B->C, and we don't also need to specify the
+// transitive A->C dependency directly.
+//
+// A A
+// / \ |
+// B ) -> B
+// \ / |
+// C C
+//
+// 2. Two nodes with the same set of predecessors and successors are turned
+// into a parallel region. This graph form can arise from use of
+// "inoutset" dependencies.
+//
+// A B C A B C
+// / \/ \/ \ | | |
+// /__/\ /\__\ | | |
+// D____X____E (A+B+C->D & A+B+C->E) -> par(D,E)
+// \ / |
+// '--F--' F
+//
+// 3. We find a node with >1 predecessor R, and group those predecessors by
+// their immediate dominators. There are two subcases from here.
+//
+// 3a. If there is more than one group of predecessors (more than one
+// dominator), we pick the dominator with the highest topological-sort
+// level, and we clone the subgraph from that dominator to R.
+//
+// 3b. If all predecessors share a single dominator, we instead pick the
+// predecessor with the highest incoming/outgoing edge count, and we clone
+// the subgraph from that predecessor to the dominator.
+//
+// For details of how the subgraph cloning works, see the commentary for
+// __kmp_taskgraph_clone_subgraph.
+//
+// In this way, irreducible edges are gradually "teased apart", and the graph
+// thus becomes reducible.
+//
+// Cloning the subgraph means that task nodes can appear more than once in the
+// taskgraph (multiple "instantiations"). The way this should be handled is
+// left to later stages of execution, allowing for runtime or API-specific
+// techniques to be used.
+//
+// Say the resulting graph clones a node N into N1 and N2. Now:
+//
+// - All of N1's predecessors and all of N2's predecessors must execute before
+// either N1 or N2 execute.
+// - Only N1 or N2 should execute, not both.
+// - All of N1's, and all of N2's, successors should execute after either N1
+// or N2 executes.
+//
+// For host execution, this is handled by __kmp_exec_descr_link_instances, etc.
+
+static bool
+__kmp_taskgraph_rewrite_irreducible(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **alloc_chain,
+ kmp_taskgraph_region_t **region_p,
+ kmp_taskgraph_region_t *exitregion) {
+ kmp_taskgraph_region_t *entryregion = *region_p;
+ bool changed = false;
+
+ kmp_int32 worklist_length = 0;
+ for (kmp_taskgraph_region_t *r = entryregion; r; r = r->next) {
+ // Deleted regions stay deleted. (We could actually remove these from
+ // the worklist here, I think.)
+ if (r->mark == TASKGRAPH_DELETED)
+ continue;
+ r->mark = TASKGRAPH_UNMARKED;
+ worklist_length++;
+ }
+
+#ifdef DEBUG_TASKGRAPH
+ TGDBG("worklist length: %d\n", worklist_length);
+
+ __kmp_taskgraph_region_dot(entryregion, "PredsAndSuccs");
+#endif
+
+ kmp_taskgraph_region_t **order =
+ (kmp_taskgraph_region_t **)__kmp_fast_allocate(thread,
+ worklist_length * sizeof(kmp_taskgraph_region_t *));
+ kmp_taskgraph_region_t **doms =
+ (kmp_taskgraph_region_t **)__kmp_fast_allocate(thread,
+ worklist_length * sizeof(kmp_taskgraph_region_t *));
+ memset(doms, 0, worklist_length * sizeof(kmp_taskgraph_region_t *));
+ kmp_int32 cursor = worklist_length;
+ assert(entryregion->type == TASKGRAPH_REGION_ENTRY);
+ __kmp_taskgraph_region_dfs(entryregion, order, cursor, false);
+ assert(cursor == 0);
+ __kmp_taskgraph_region_doms(order, doms, worklist_length, false);
+
+#ifdef DEBUG_TASKGRAPH
+ fprintf(stderr, "digraph {\n");
+ for (kmp_int32 i = 0; i < worklist_length; i++) {
+ kmp_taskgraph_region_t *b = order[i];
+ for (kmp_taskgraph_region_dep_t *succ = b->successors; succ;
+ succ = succ->next) {
+ fprintf(stderr, " \"%d\" -> \"%d\"\n", b->timestamp,
+ succ->region->timestamp);
+ }
+ fprintf(stderr, " \"%d\" -> \"%d\" [color=green, constraint=false]\n",
+ b->timestamp, doms[b->timestamp]->timestamp);
+ }
+ fprintf(stderr, "}\n");
+#endif
+
+ // Irreducible regions are handled by duplicating regions, and those new
+ // regions need adding to the worklist. The added_worklist variable stores
+ // the head of the new work to be added.
+ kmp_taskgraph_region_t *added_worklist = nullptr;
+ kmp_taskgraph_region_t **added_worklist_p = &added_worklist;
+
+ bool dropped_preds_p = false;
+
+ for (kmp_int32 i = 0; i < worklist_length; i++) {
+ kmp_taskgraph_region_t *region = order[i];
+ if (__kmp_region_deplist_len(region->predecessors) < 2)
+ continue;
+ TGDBG("checking region %p for redundant predecessors\n", region);
+ kmp_taskgraph_region_dep_t **predp = ®ion->predecessors;
+ while (*predp) {
+ kmp_taskgraph_region_dep_t *pred = *predp;
+
+ bool passes_pred = false;
+ for (kmp_taskgraph_region_dep_t *rest = region->predecessors; rest;
+ rest = rest->next) {
+ if (rest->region == pred->region)
+ continue;
+ kmp_taskgraph_region_t *dom = doms[rest->region->timestamp];
+ TGDBG("pred region: %p, next: %p\n", pred->region, rest->region);
+ while (true) {
+ TGDBG("check against dom: %p\n", dom);
+ if (dom == pred->region) {
+ passes_pred = true;
+ break;
+ } else if (dom == doms[dom->timestamp]) {
+ break;
+ } else {
+ dom = doms[dom->timestamp];
+ }
+ }
+ if (passes_pred)
+ break;
+ }
+
+ if (passes_pred) {
+ // We can drop this predecessor.
+ TGDBG("dropping pred %p from region %p, dom %p\n",
+ pred->region, region, doms[pred->region->timestamp]);
+ kmp_taskgraph_region_dep_t *next = pred->next;
+ kmp_taskgraph_region_dep_t **succp = &pred->region->successors;
+ while (*succp) {
+ kmp_taskgraph_region_dep_t *succ = *succp;
+ if (succ->region == region) {
+ kmp_taskgraph_region_dep_t *nexts = succ->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, succ);
+ *succp = nexts;
+ } else {
+ succp = &succ->next;
+ }
+ }
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, pred);
+ *predp = next;
+ dropped_preds_p = true;
+ } else {
+ predp = &pred->next;
+ }
+ }
+ }
+
+ if (dropped_preds_p)
+ return true;
+
+ kmp_bitset_t **pred_bitsets = nullptr;
+ kmp_bitset_t **succ_bitsets = nullptr;
+
+ bool regions_combined_p = false;
+
+ for (kmp_int32 i = 0; i < worklist_length; i++) {
+ kmp_taskgraph_region_t *region = order[i];
+ struct {
+ kmp_taskgraph_region_t *dom;
+ kmp_int32 count;
+ } dom_groups[worklist_length];
+ kmp_int32 num_groups = 0;
+ kmp_int32 npreds = __kmp_region_deplist_len(region->predecessors);
+ if (npreds >= 2) {
+ kmp_taskgraph_region_dep_t *pred;
+ for (pred = region->predecessors; pred; pred = pred->next) {
+ kmp_taskgraph_region_t *pred_region = pred->region;
+ kmp_taskgraph_region_t *this_dom = doms[pred_region->timestamp];
+#ifdef DEBUG_TASKGRAPH
+ kmp_int32 edges_to_dom =
+ __kmp_taskgraph_count_edges_to_dominator(pred_region, this_dom);
+ TGDBG("this pred: %p, edges_to_dom=%d\n", pred_region, edges_to_dom);
+#endif
+ bool found = false;
+ for (kmp_int32 grp = 0; grp < num_groups; grp++) {
+ if (dom_groups[grp].dom == this_dom) {
+ dom_groups[grp].count++;
+ found = true;
+ break;
+ }
+ }
+ if (!found) {
+ dom_groups[num_groups].dom = this_dom;
+ dom_groups[num_groups].count = 1;
+ num_groups++;
+ }
+ }
+
+ if (num_groups == 1 && region->mark != TASKGRAPH_COMBINED) {
+ TGDBG("region %p: all predecessors have a single dominator\n", region);
+
+ if (!pred_bitsets) {
+ pred_bitsets = (kmp_bitset_t **) __kmp_fast_allocate(thread,
+ sizeof(kmp_bitset_t *) * worklist_length);
+ succ_bitsets = (kmp_bitset_t **) __kmp_fast_allocate(thread,
+ sizeof(kmp_bitset_t *) * worklist_length);
+
+ for (kmp_int32 i = 0; i < worklist_length; i++) {
+ pred_bitsets[i] = __kmp_bitset_alloc(thread, worklist_length);
+ succ_bitsets[i] = __kmp_bitset_alloc(thread, worklist_length);
+ }
+
+ for (kmp_int32 j = 0; j < worklist_length; j++) {
+ kmp_taskgraph_region_t *reg = order[j];
+
+ for (pred = reg->predecessors; pred; pred = pred->next) {
+ __kmp_bitset_set(pred_bitsets[reg->timestamp],
+ pred->region->timestamp);
+ }
+
+ for (kmp_taskgraph_region_dep_t *succ = reg->successors; succ;
+ succ = succ->next) {
+ __kmp_bitset_set(succ_bitsets[reg->timestamp],
+ succ->region->timestamp);
+ }
+ }
+ }
+
+ kmp_taskgraph_region_dep_t *equal_deps_chain = nullptr;
+
+ kmp_int32 same_preds_and_succs = 1;
+ bool any_mutex_p = __kmp_taskgraph_region_mutex_p(region);
+ // FIXME: We might be able to do a bit better than this by hashing.
+ for (kmp_int32 j = i + 1; j < worklist_length; j++) {
+ if (order[j]->mark != TASKGRAPH_COMBINED &&
+ __kmp_bitset_equal(pred_bitsets[j], pred_bitsets[i]) &&
+ __kmp_bitset_equal(succ_bitsets[j], succ_bitsets[i])) {
+ TGDBG("regions %p and %p share all predecessors/successors\n",
+ order[i], order[j]);
+ same_preds_and_succs++;
+ equal_deps_chain =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ order[j], equal_deps_chain);
+ if (__kmp_taskgraph_region_mutex_p(order[j]))
+ any_mutex_p = true;
+ }
+ }
+ if (same_preds_and_succs > 1) {
+ kmp_taskgraph_region_type region_type =
+ any_mutex_p ? TASKGRAPH_REGION_EXCLUSIVE
+ : TASKGRAPH_REGION_PARALLEL;
+ kmp_taskgraph_region_t *par_region =
+ __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain,
+ region_type, same_preds_and_succs,
+ nullptr);
+ par_region->inner.children[0] = region;
+ region->mark = TASKGRAPH_COMBINED;
+ region->parent = par_region;
+ for (kmp_int32 j = 1; j < same_preds_and_succs; j++) {
+ kmp_taskgraph_region_dep_t *next = equal_deps_chain->next;
+ par_region->inner.children[j] = equal_deps_chain->region;
+ equal_deps_chain->region->mark = TASKGRAPH_COMBINED;
+ equal_deps_chain->region->parent = par_region;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps,
+ equal_deps_chain);
+ equal_deps_chain = next;
+ }
+ par_region->predecessors =
+ par_region->inner.children[0]->predecessors;
+ par_region->inner.children[0]->predecessors = nullptr;
+ par_region->successors = par_region->inner.children[0]->successors;
+ par_region->inner.children[0]->successors = nullptr;
+
+ // Redirect incoming deps to point to new parallel region.
+ for (pred = par_region->predecessors; pred; pred = pred->next) {
+ kmp_taskgraph_region_t *pred_region = pred->region;
+ kmp_taskgraph_region_dep_t **succp = &pred_region->successors;
+ while (*succp) {
+ kmp_taskgraph_region_dep_t *succ = *succp;
+ if (succ->region == par_region->inner.children[0]) {
+ succ->region = par_region;
+ succp = &succ->next;
+ } else {
+ bool found = false;
+ for (kmp_int32 j = 1; j < same_preds_and_succs; j++) {
+ if (succ->region == par_region->inner.children[j]) {
+ found = true;
+ break;
+ }
+ }
+ if (found) {
+ kmp_taskgraph_region_dep_t *next = succ->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, succ);
+ *succp = next;
+ } else {
+ succp = &succ->next;
+ }
+ }
+ }
+ }
+
+ for (kmp_taskgraph_region_dep_t *succ = par_region->successors; succ;
+ succ = succ->next) {
+ kmp_taskgraph_region_t *succ_region = succ->region;
+ kmp_taskgraph_region_dep_t **predp = &succ_region->predecessors;
+ while (*predp) {
+ kmp_taskgraph_region_dep_t *pred = *predp;
+ if (pred->region == par_region->inner.children[0]) {
+ pred->region = par_region;
+ predp = &pred->next;
+ } else {
+ bool found = false;
+ for (kmp_int32 j = 1; j < same_preds_and_succs; j++) {
+ if (pred->region == par_region->inner.children[j]) {
+ found = true;
+ break;
+ }
+ }
+ if (found) {
+ kmp_taskgraph_region_dep_t *next = pred->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, pred);
+ *predp = next;
+ } else {
+ predp = &pred->next;
+ }
+ }
+ }
+ }
+
+ par_region->next = region->next;
+ region->next = par_region;
+
+ regions_combined_p = true;
+ }
+ }
+
+ if (regions_combined_p)
+ continue;
+
+ assert (num_groups >= 1);
+
+ TGDBG("should split region %p (%d)\n", region, region->timestamp);
+ TGDBG("clone graph to dominator: %p (%d, %s)\n",
+ doms[region->timestamp],
+ doms[region->timestamp]->timestamp,
+ __kmp_taskgraph_region_type_name(doms[region->timestamp]->type));
+ kmp_taskgraph_region_t *region_dom = doms[region->timestamp];
+ kmp_int32 grp = -1;
+ kmp_int32 highest_dom = -1;
+ // Choose a dominator. We pick one with the highest level, i.e.
+ // with the largest chain of dependents. Anything we pick should
+ // be irreducible, because we've already tried the serial-parallel
+ // decomposition.
+ for (kmp_int32 j = 0; j < num_groups; j++) {
+ if (dom_groups[j].dom->level > highest_dom) {
+ grp = j;
+ highest_dom = dom_groups[j].dom->level;
+ }
+ }
+
+ // Separate out the predecessors with this dominator (identified by
+ // grp).
+ kmp_taskgraph_region_dep_t *preds_with_dom = nullptr;
+ kmp_taskgraph_region_dep_t **pwd_tail = &preds_with_dom;
+ kmp_taskgraph_region_dep_t **pred_cursor = ®ion->predecessors;
+ TGDBG("before splitting we have %d preds\n",
+ __kmp_region_deplist_len(region->predecessors));
+ while (*pred_cursor) {
+ kmp_taskgraph_region_dep_t *this_pred = *pred_cursor;
+ kmp_taskgraph_region_t *dom = doms[this_pred->region->timestamp];
+ if (dom == dom_groups[grp].dom) {
+ *pwd_tail = this_pred;
+ pwd_tail = &this_pred->next;
+ *pred_cursor = this_pred->next;
+ } else {
+ pred_cursor = &this_pred->next;
+ }
+ }
+ // Finish list.
+ *pwd_tail = nullptr;
+
+ if (!region->predecessors) {
+ kmp_int32 highest = -1;
+ kmp_taskgraph_region_dep_t **use_pred = nullptr;
+ // This can only happen if...
+ assert(num_groups == 1);
+ region->predecessors = preds_with_dom;
+ for (kmp_taskgraph_region_dep_t **rp = ®ion->predecessors; *rp;
+ rp = &(*rp)->next) {
+ kmp_int32 count =
+ __kmp_taskgraph_count_edges_to_dominator((*rp)->region,
+ dom_groups[grp].dom);
+ TGDBG("for pred %p, outgoing edges to dom = %d\n", (*rp)->region,
+ count);
+ if (count > highest) {
+ highest = count;
+ use_pred = rp;
+ }
+ }
+ TGDBG("using pred %p\n", (*use_pred)->region);
+ // Pick the single predecessor with the largest outgoing edge
+ // count (the "most complicated" predecessor).
+ preds_with_dom = *use_pred;
+ *use_pred = (*use_pred)->next;
+ preds_with_dom->next = nullptr;
+ }
+
+ kmp_taskgraph_region_dep_t *unlinked_successors = nullptr;
+
+ // Unlink successors for preds_with_dom nodes, and record where they
+ // came from.
+ for (pred = preds_with_dom; pred; pred = pred->next) {
+ kmp_taskgraph_region_dep_t **succp = &pred->region->successors;
+ while (*succp) {
+ kmp_taskgraph_region_dep_t *succ = *succp;
+ kmp_taskgraph_region_t *succ_region = succ->region;
+ if (succ_region == region) {
+ kmp_taskgraph_region_dep_t *next = succ->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, succ);
+ TGDBG("unlinking successor %p -> %p\n", pred->region, region);
+ unlinked_successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ pred->region, unlinked_successors);
+ *succp = next;
+ } else {
+ succp = &succ->next;
+ }
+ }
+ }
+
+ TGDBG("after splitting, # preds_with_dom=%d, others %d\n",
+ __kmp_region_deplist_len(preds_with_dom),
+ __kmp_region_deplist_len(region->predecessors));
+ *pwd_tail = nullptr;
+ kmp_taskgraph_region_t *cloned_nodes[worklist_length];
+ memset(cloned_nodes, 0,
+ sizeof(kmp_taskgraph_region_t *) * worklist_length);
+ __kmp_taskgraph_clone_subgraph(thread, taskgraph, alloc_chain,
+ cloned_nodes, region, doms, preds_with_dom,
+ region_dom, &added_worklist_p);
+ // Now fill in the successors for the cloned regions.
+ for (kmp_int32 n = 0; n < worklist_length; n++) {
+ kmp_taskgraph_region_t *cloned_region = cloned_nodes[n];
+ if (!cloned_region)
+ continue;
+ for (kmp_taskgraph_region_dep_t *pred = cloned_region->predecessors;
+ pred; pred = pred->next) {
+ kmp_taskgraph_region_t *pred_region = pred->region;
+ pred_region->successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ cloned_region, pred_region->successors);
+ }
+ }
+
+#ifdef DEBUG_TASKGRAPH
+ TGDBG("before appending:\n");
+ for (pred = region->predecessors; pred; pred = pred->next) {
+ TGDBG("region %p, pred: %p\n", region, pred);
+ }
+#endif
+
+ // Re-attach redirected predecessor list to region's predecessors.
+ pred = region->predecessors;
+ if (pred) {
+ while (pred && pred->next)
+ pred = pred->next;
+ pred->next = preds_with_dom;
+ } else {
+ region->predecessors = preds_with_dom;
+ }
+
+#ifdef DEBUG_TASKGRAPH
+ TGDBG("after appending:\n");
+ for (pred = region->predecessors; pred; pred = pred->next) {
+ TGDBG("region %p, pred: %p\n", region, pred);
+ }
+#endif
+
+ // Redirect the unlinked successors from the region's original
+ // predecessors so that the new (cloned) predecessors still point to
+ // the region.
+ for (kmp_taskgraph_region_dep_t *succ = unlinked_successors; succ;) {
+ kmp_taskgraph_region_t *cloned_reg =
+ cloned_nodes[succ->region->timestamp];
+ kmp_taskgraph_region_dep_t *next = succ->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, succ);
+ TGDBG("add successor to cloned region: %p -> %p\n", cloned_reg, region);
+ cloned_reg->successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps, region,
+ cloned_reg->successors);
+ succ = next;
+ }
+
+ // Cloning subgraph invalidates e.g. the timestamp fields: just do
+ // one round of transformation. We could possibly do more if we
+ // were careful.
+
+ changed = true;
+ }
+ if (changed)
+ break;
+ }
+
+ if (regions_combined_p)
+ changed = true;
+
+ if (pred_bitsets) {
+ for (kmp_int32 j = 0; j < worklist_length; j++) {
+ __kmp_bitset_free(thread, pred_bitsets[j]);
+ __kmp_bitset_free(thread, succ_bitsets[j]);
+ }
+ __kmp_fast_free(thread, pred_bitsets);
+ __kmp_fast_free(thread, succ_bitsets);
+ }
+
+ *added_worklist_p = nullptr;
+ added_worklist = __kmp_region_worklist_reverse(added_worklist);
+
+ kmp_taskgraph_region_t *last = exitregion;
+ while (last && last->next)
+ last = last->next;
+ last->next = added_worklist;
+
+ TGDBG("starting trim dead edges...\n");
+
+ for (kmp_taskgraph_region_t *r = entryregion; r; r = r->next) {
+ r->mark = TASKGRAPH_UNMARKED;
+ }
+
+ // Remove any regions which are now unreachable by DFS from the exit
+ // region, and any connected dependency edges.
+ int idx = 0;
+ __kmp_taskgraph_region_dfs(exitregion, nullptr, idx, true);
+ for (kmp_taskgraph_region_t *r = entryregion; r; r = r->next) {
+ if (r->mark == TASKGRAPH_UNMARKED) {
+ r->mark = TASKGRAPH_DELETED;
+
+ __kmp_region_deplist_recycle(&taskgraph->recycled_deps, r->successors);
+ r->successors = nullptr;
+
+ // Delete predecessors for deleted nodes (and corresponding
+ // successors).
+ kmp_taskgraph_region_dep_t **predp = &r->predecessors;
+ while (*predp) {
+ kmp_taskgraph_region_dep_t *pred = *predp;
+ if (pred->region->mark != TASKGRAPH_UNMARKED) {
+ kmp_taskgraph_region_dep_t **succp = &pred->region->successors;
+ while (*succp) {
+ kmp_taskgraph_region_dep_t *succ = *succp;
+ if (succ->region == r) {
+ kmp_taskgraph_region_dep_t *next = succ->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, succ);
+ *succp = next;
+ } else {
+ succp = &succ->next;
+ }
+ }
+ }
+ kmp_taskgraph_region_dep_t *next = pred->next;
+ __kmp_region_dep_recycle(&taskgraph->recycled_deps, pred);
+ *predp = next;
+ }
+ }
+ }
+
+ TGDBG("done trimming dead edges.\n");
+
+ __kmp_taskgraph_region_chain_prune(&entryregion);
+ __kmp_taskgraph_region_worklist_check(thread, taskgraph, entryregion,
+ "after irreducible handling");
+
+ worklist_length = 0;
+ for (kmp_taskgraph_region_t *r = entryregion; r; r = r->next) {
+ r->mark = TASKGRAPH_UNMARKED;
+ worklist_length++;
+ }
+
+ // Recalculate topological sort
+ kmp_int32 max_level = -1;
+ kmp_taskgraph_region_t *r = entryregion;
+ kmp_int32 outidx = 0;
+ kmp_taskgraph_region_t *order_out[worklist_length];
+ for (kmp_int32 i = 0; i < worklist_length; i++, r = r->next) {
+ if (r->mark == TASKGRAPH_UNMARKED) {
+ kmp_int32 level =
+ __kmp_taskgraph_topological_order(r, order_out, &outidx);
+ max_level = level > max_level ? level : max_level;
+ }
+ }
+
+ // Re-sort worklist wrt. topological order calculated above.
+ kmp_taskgraph_region_t **relink = &entryregion;
+ for (kmp_int32 i = 0; i < worklist_length; i++) {
+ *relink = order_out[i];
+ relink = &order_out[i]->next;
+ }
+ *relink = nullptr;
+
+#ifdef DEBUG_TASKGRAPH
+ __kmp_taskgraph_region_dot(entryregion, "PredsAndSuccsAfter");
+#endif
+
+ return changed;
+}
+
+/// Build a nested region structure out of a recorded taskgraph.
+//
+// The algorithm proceeds by alternating two phases until a single top-level
+// node is reached. Briefly, and glossing over some details:
+//
+// 1. Serial-parallel decomposition. Chains of single-successor,
+// single-predecessor nodes are collapsed into a "sequential" region, and
+// nodes with >1 predecessor, where each predecessor has a single
+// predecessor and a single successor, are collapsed into "parallel" regions.
+//
+// 2. Irreducible-graph processing. Several techniques are used to turn graphs
+// not handled by step (1) into graphs that can be handled by that step.
+//
+// Notably, simple graphs that can be handled entirely by step (1) avoid doing
+// much of the heavier processing involved in step (2), so the common case
+// should be relatively fast.
+
+static kmp_taskgraph_region_t *
+__kmp_taskgraph_build_regions(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_region_t *entryregion,
+ kmp_taskgraph_region_t *exitregion) {
+ bool changed;
+ kmp_int32 phase = 0;
+
+#ifdef DEBUG_TASKGRAPH
+ __kmp_taskgraph_region_dot(entryregion, "InitialPredsAndSuccs");
+#endif
+
+ __kmp_taskgraph_region_chain_clear_marks(entryregion);
+
+ while (true) {
+ do {
+ changed = false;
+ TGDBG("starting seq pass\n");
+ for (kmp_taskgraph_region_t **seq_head = &entryregion; *seq_head;
+ seq_head = &(*seq_head)->next) {
+ TGDBG("consider %s region: %p\n",
+ __kmp_taskgraph_region_type_name((*seq_head)->type), *seq_head);
+ if ((*seq_head)->mark == TASKGRAPH_COMBINED) {
+ TGDBG("already combined\n");
+ continue;
+ }
+ changed |=
+ __kmp_taskgraph_collapse_sequence(thread, taskgraph, alloc_chain, seq_head,
+ /*parent=*/nullptr, phase);
+ TGDBG("changed: %s\n", changed ? "true" : "false");
+ }
+ ++phase;
+ __kmp_taskgraph_region_chain_prune(&entryregion);
+ __kmp_taskgraph_region_worklist_check(thread, taskgraph, entryregion,
+ "after seq collapse");
+ TGDBG("starting par/unordered pass\n");
+ for (kmp_taskgraph_region_t **par_head = &entryregion; *par_head;
+ par_head = &(*par_head)->next) {
+ TGDBG("consider %s region: %p\n",
+ __kmp_taskgraph_region_type_name((*par_head)->type), *par_head);
+ if ((*par_head)->mark == TASKGRAPH_COMBINED) {
+ TGDBG("already combined\n");
+ continue;
+ }
+ changed |=
+ __kmp_taskgraph_collapse_par_exclusive(thread, taskgraph, alloc_chain,
+ par_head, /*parent=*/nullptr,
+ phase);
+ TGDBG("changed: %s\n", changed ? "true" : "false");
+ }
+ ++phase;
+ __kmp_taskgraph_region_chain_prune(&entryregion);
+ __kmp_taskgraph_region_worklist_check(thread, taskgraph, entryregion,
+ "after par collapse");
+ } while (changed);
+
+ if (entryregion->type == TASKGRAPH_REGION_ENTRY) {
+ if (__kmp_region_deplist_len(entryregion->successors) == 1) {
+ kmp_taskgraph_region_t *one_region = entryregion->successors->region;
+ if (__kmp_region_deplist_len(one_region->successors) == 1) {
+ kmp_taskgraph_region_t *maybe_exit = one_region->successors->region;
+ if (maybe_exit->type == TASKGRAPH_REGION_EXIT)
+ return one_region;
+ }
+ }
+ } else {
+ fprintf(stderr, "FIXME: Expected entry region!\n");
+ return entryregion;
+ }
+
+ TGDBG("attempting to collapse irreducible regions\n");
+
+ changed |=
+ __kmp_taskgraph_rewrite_irreducible(thread, taskgraph, alloc_chain,
+ &entryregion, exitregion);
+
+ if (!changed) {
+ fprintf(stderr, "FIXME: Failed to transform irreducible graph\n");
+ return entryregion;
+ }
+ }
+
+ return entryregion;
+}
+
+static void
+__kmp_taskgraph_count_nodes(kmp_taskgraph_region_t *region) {
+ switch (region->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ return;
+ case TASKGRAPH_REGION_NODE:
+ case TASKGRAPH_REGION_WAIT: {
+ TGDBG("process region %p\n", region);
+ region->task.node->u.resolved.count++;
+ kmp_taskgraph_region_t *last_region =
+ region->task.node->u.resolved.last_region;
+ TGDBG("last region: %p\n", last_region);
+ if (last_region) {
+ kmp_taskgraph_region_t *next = last_region->task.next_instance;
+ TGDBG("next: %p\n", next);
+ last_region->task.next_instance = region;
+ region->task.next_instance = next;
+ }
+ region->task.node->u.resolved.last_region = region;
+ return;
+ }
+ default:
+ for (kmp_int32 n = 0; n < region->inner.num_children; n++) {
+ __kmp_taskgraph_count_nodes(region->inner.children[n]);
+ }
+ }
+}
+
+static void
+__kmp_taskgraph_gather_mutex_sets(kmp_info_t *thread,
+ kmp_taskgraph_region_t *region,
+ const kmp_bitset_t *held) {
+ switch (region->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ case TASKGRAPH_REGION_WAIT:
+ return;
+ case TASKGRAPH_REGION_NODE: {
+#ifdef DEBUG_TASKGRAPH
+ if (region->mutexset && __kmp_bitset_subset_p(held, region->mutexset)) {
+ TGDBG("node is mutually exclusive with held: 0x%llx <: 0x%llx\n",
+ (unsigned long long)region->mutexset->bits[0],
+ (unsigned long long)held->bits[0]);
+ }
+#endif
+ return;
+ }
+ case TASKGRAPH_REGION_SEQUENTIAL: {
+ kmp_bitset_t *seq_held = __kmp_bitset_alloc(thread, held->bitsize);
+ __kmp_bitset_clearall(seq_held);
+ for (kmp_int32 child = 0; child < region->inner.num_children; child++) {
+ __kmp_taskgraph_gather_mutex_sets(thread, region->inner.children[child],
+ held);
+ if (region->inner.children[child]->mutexset)
+ __kmp_bitset_or(seq_held, seq_held,
+ region->inner.children[child]->mutexset);
+ }
+ region->mutexset = seq_held;
+ return;
+ }
+ case TASKGRAPH_REGION_PARALLEL:
+ case TASKGRAPH_REGION_EXCLUSIVE: {
+ kmp_bitset_t *par_held = __kmp_bitset_alloc(thread, held->bitsize);
+ kmp_bitset_t *conflicts = __kmp_bitset_alloc(thread, held->bitsize);
+ while (true) {
+ __kmp_bitset_clearall(par_held);
+ for (kmp_int32 child = 0; child < region->inner.num_children; child++) {
+ __kmp_bitset_clearall(conflicts);
+ for (kmp_int32 other = 0; other < region->inner.num_children; other++) {
+ if (other != child) {
+ if (!region->inner.children[other]->mutexset)
+ __kmp_taskgraph_gather_mutex_sets(thread,
+ region->inner.children[other],
+ held);
+ if (region->inner.children[other]->mutexset)
+ __kmp_bitset_or(conflicts, conflicts,
+ region->inner.children[other]->mutexset);
+ }
+ }
+ __kmp_taskgraph_gather_mutex_sets(thread,
+ region->inner.children[child],
+ conflicts);
+ if (region->inner.children[child]->mutexset)
+ __kmp_bitset_or(par_held, par_held,
+ region->inner.children[child]->mutexset);
+ }
+ if (!region->mutexset) {
+ region->mutexset = par_held;
+ } else if (__kmp_bitset_equal(region->mutexset, par_held)) {
+ TGDBG("par mutexes stabilized, exiting loop\n");
+ break;
+ } else {
+ TGDBG("par mutexes not stable, iterating\n");
+ __kmp_bitset_copy(region->mutexset, par_held);
+ __kmp_bitset_free(thread, par_held);
+ }
+ }
+ __kmp_bitset_free(thread, conflicts);
+ return;
+ }
+ }
+}
+
+static int
+__kmp_popcount_cmp(const void *a, const void *b) {
+ const kmp_taskgraph_region_t *reg_a = *(kmp_taskgraph_region_t **) a;
+ const kmp_taskgraph_region_t *reg_b = *(kmp_taskgraph_region_t **) b;
+ kmp_int32 popc_a = 0, popc_b = 0;
+ if (reg_a->mutexset)
+ popc_a = __kmp_bitset_popcount(reg_a->mutexset);
+ if (reg_b->mutexset)
+ popc_b = __kmp_bitset_popcount(reg_b->mutexset);
+ if (popc_a > popc_b)
+ return -1;
+ else if (popc_a < popc_b)
+ return 1;
+ return 0;
+}
+
+/// Find "mutexinoutset" regions that can be represented without explicit
+// mutexes, i.e. using "TASKGRAPH_REGION_EXCLUSIVE".
+
+static void
+__kmp_taskgraph_find_exclusive_regions(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_region_t **region_p) {
+ kmp_taskgraph_region_t *region = *region_p;
+ switch (region->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ case TASKGRAPH_REGION_NODE:
+ case TASKGRAPH_REGION_WAIT:
+ break;
+ case TASKGRAPH_REGION_SEQUENTIAL:
+ case TASKGRAPH_REGION_PARALLEL: {
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ __kmp_taskgraph_find_exclusive_regions(thread, taskgraph, alloc_chain,
+ ®ion->inner.children[c]);
+ }
+ break;
+ }
+ case TASKGRAPH_REGION_EXCLUSIVE: {
+ qsort(region->inner.children, region->inner.num_children,
+ sizeof(kmp_taskgraph_region_t *), __kmp_popcount_cmp);
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ TGDBG("building tree: region mutexset = 0x%llx\n",
+ (unsigned long long) region->inner.children[c]->mutexset
+ ? region->inner.children[c]->mutexset->bits[0] : 0);
+ region->inner.children[c]->mark = TASKGRAPH_UNMARKED;
+ }
+ kmp_bitset_t *conflicts =
+ __kmp_bitset_alloc(thread, region->mutexset->bitsize);
+ kmp_bitset_t *subsets_cover =
+ __kmp_bitset_alloc(thread, region->mutexset->bitsize);
+ __kmp_bitset_copy(conflicts, region->mutexset);
+ bool irregular = false;
+ kmp_int32 combined_children = 0;
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ kmp_bitset_t *candidate = region->inner.children[c]->mutexset;
+ if (__kmp_bitset_empty_p(candidate))
+ continue;
+ __kmp_bitset_clearall(subsets_cover);
+ bool found_subset = false;
+ bool other_overlaps = false;
+ for (kmp_int32 d = c + 1; d < region->inner.num_children; d++) {
+ // This could test for a subset in some cases, but that adds
+ // complication for later processing. Maybe revisit later if it
+ // seems worthwhile.
+ // E.g. if we have deps like this:
+ //
+ // #pragma omp task depend(mutexinoutset: deps[0], deps[1]) { /*a*/ }
+ // #pragma omp task depend(mutexinoutset: deps[0]) { /*b*/ }
+ // #pragma omp task depend(mutexinoutset: deps[1]) { /*c*/ }
+ //
+ // This could be represented as:
+ //
+ // exclusive {
+ // node: a
+ // parallel {
+ // node: b
+ // node: c
+ // }
+ // }
+ //
+ // We're not doing that yet though.
+ if (__kmp_bitset_equal(candidate,
+ region->inner.children[d]->mutexset)) {
+ found_subset = true;
+ __kmp_bitset_or(subsets_cover, subsets_cover,
+ region->inner.children[d]->mutexset);
+ } else if (__kmp_bitset_intersect_p(
+ candidate, region->inner.children[d]->mutexset)) {
+ other_overlaps = true;
+ break;
+ }
+ }
+ if (!found_subset || other_overlaps)
+ continue;
+ if (!__kmp_bitset_equal(subsets_cover, candidate)) {
+ TGDBG("subsets cover: 0x%llx, candidate: 0x%llx\n",
+ (unsigned long long)subsets_cover->bits[0],
+ (unsigned long long)candidate->bits[0]);
+ irregular = true;
+ break;
+ }
+ for (kmp_int32 d = c + 1; d < region->inner.num_children; d++) {
+ if (region->inner.children[d]->mutexset_parent)
+ continue;
+ // As above wrt. subsets.
+ if (__kmp_bitset_equal(candidate,
+ region->inner.children[d]->mutexset)) {
+ TGDBG("set index %d's parent to index %d\n", d, c);
+ region->inner.children[d]->mutexset_parent =
+ region->inner.children[c];
+ combined_children++;
+ __kmp_bitset_and_not(conflicts, conflicts, candidate);
+ }
+ }
+ }
+ TGDBG("irregular: %s\n", irregular ? "true" : "false");
+ TGDBG("final conflicts: 0x%llx\n",
+ (unsigned long long)conflicts->bits[0]);
+ __kmp_bitset_free(thread, subsets_cover);
+ region->type = TASKGRAPH_REGION_PARALLEL;
+ if (!irregular && __kmp_bitset_empty_p(conflicts)) {
+ TGDBG("transforming exclusive region %p\n", region);
+ TGDBG("orig region children: %d\n", region->inner.num_children);
+ TGDBG("combined children: %d\n", combined_children);
+ if (region->inner.num_children == combined_children + 1) {
+ region->type = TASKGRAPH_REGION_EXCLUSIVE;
+ } else {
+ kmp_taskgraph_region_t *new_par =
+ __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain,
+ TASKGRAPH_REGION_PARALLEL,
+ region->inner.num_children -
+ combined_children,
+ nullptr);
+ for (kmp_int32 c = region->inner.num_children - 1; c >= 0; c--) {
+ kmp_taskgraph_region_t *child = region->inner.children[c];
+ // Make mutex set into a circular list.
+ if (child->mutexset_parent && child->mark != TASKGRAPH_TEMP_MARK) {
+ if (!child->mutexset_parent->mutexset_parent) {
+ // child <-> parent
+ child->mutexset_parent->mutexset_parent = child;
+ child->mutexset_parent->mark = TASKGRAPH_TEMP_MARK;
+ } else {
+ kmp_taskgraph_region_t *parent = child->mutexset_parent;
+ child->mutexset_parent = parent->mutexset_parent;
+ parent->mutexset_parent = child;
+ parent->mark = TASKGRAPH_TEMP_MARK;
+ }
+ }
+ }
+ kmp_int32 idx = 0;
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ kmp_taskgraph_region_t *child = region->inner.children[c];
+ TGDBG("process child: %p\n", child);
+ if (child->mutexset_parent && child->mark != TASKGRAPH_COMBINED) {
+ kmp_int32 elems = 0;
+ kmp_taskgraph_region_t *next = child;
+ do {
+ elems++;
+ next = next->mutexset_parent;
+ } while (next != child);
+ TGDBG("make exclusive region with %d children\n", elems);
+ kmp_taskgraph_region_t *excl_region =
+ __kmp_taskgraph_region_alloc(thread, taskgraph, alloc_chain,
+ TASKGRAPH_REGION_EXCLUSIVE, elems,
+ nullptr);
+ kmp_int32 excl_child = 0;
+ next = child;
+ do {
+ excl_region->inner.children[excl_child++] = next;
+ next->mark = TASKGRAPH_COMBINED;
+ next = next->mutexset_parent;
+ } while (next != child);
+ assert(excl_child == excl_region->inner.num_children);
+ new_par->inner.children[idx++] = excl_region;
+ } else if (!child->mutexset_parent) {
+ new_par->inner.children[idx++] = child;
+ }
+ }
+ TGDBG("idx=%d, supposed to be %d\n", idx,
+ new_par->inner.num_children);
+ assert(idx == new_par->inner.num_children);
+ *region_p = new_par;
+ region->mark = TASKGRAPH_DELETED;
+ }
+ }
+ __kmp_bitset_free(thread, conflicts);
+ break;
+ }
+ default:
+ assert(false && "unreachable");
+ }
+}
+
+/// Strip mutex sets from taskgraph region, except those needed at runtime.
+
+static kmp_int32
+__kmp_taskgraph_strip_mutex_sets(kmp_info_t *thread,
+ kmp_taskgraph_region_t *region,
+ bool in_exclusive = false) {
+ kmp_int32 mutexes_needed = 0;
+ switch (region->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ case TASKGRAPH_REGION_WAIT:
+ assert(!region->mutexset);
+ break;
+ case TASKGRAPH_REGION_NODE:
+ if (region->mutexset) {
+ if (in_exclusive) {
+ __kmp_bitset_free(thread, region->mutexset);
+ region->mutexset = nullptr;
+ } else {
+ // FIXME: This might be pessimistic -- the remaining mutex sets might
+ // have holes or duplicates. We could compact them.
+ kmp_int32 m = region->mutexset->bitsize;
+ mutexes_needed = std::max(mutexes_needed, m);
+ }
+ }
+ break;
+ case TASKGRAPH_REGION_EXCLUSIVE: {
+ if (region->mutexset) {
+ __kmp_bitset_free(thread, region->mutexset);
+ region->mutexset = nullptr;
+ }
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ kmp_int32 m =
+ __kmp_taskgraph_strip_mutex_sets(thread, region->inner.children[c],
+ true);
+ mutexes_needed = std::max(mutexes_needed, m);
+ }
+ break;
+ }
+ default: {
+ if (region->mutexset) {
+ __kmp_bitset_free(thread, region->mutexset);
+ region->mutexset = nullptr;
+ }
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ kmp_int32 m =
+ __kmp_taskgraph_strip_mutex_sets(thread, region->inner.children[c],
+ in_exclusive);
+ mutexes_needed = std::max(mutexes_needed, m);
+ }
+ }
+ }
+ return mutexes_needed;
+}
+
+static void
+__kmp_taskgraph_exclusive_regions(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t **&alloc_chain,
+ kmp_taskgraph_region_t **region_p,
+ kmp_int32 max_mutex) {
+ kmp_bitset_t *top = __kmp_bitset_alloc(thread, max_mutex);
+ __kmp_bitset_clearall(top);
+ __kmp_taskgraph_gather_mutex_sets(thread, *region_p, top);
+ __kmp_taskgraph_find_exclusive_regions(thread, taskgraph, alloc_chain,
+ region_p);
+ kmp_int32 num_mutexes = __kmp_taskgraph_strip_mutex_sets(thread, *region_p);
+ taskgraph->num_mutexes = num_mutexes;
+}
+
+static const char*
+__kmp_taskgraph_region_type_name(kmp_taskgraph_region_type type) {
+ switch (type) {
+ case TASKGRAPH_REGION_ENTRY: return "entry";
+ case TASKGRAPH_REGION_EXIT: return "exit";
+ case TASKGRAPH_REGION_NODE: return "node";
+ case TASKGRAPH_REGION_WAIT: return "wait";
+ case TASKGRAPH_REGION_PARALLEL: return "parallel";
+ case TASKGRAPH_REGION_EXCLUSIVE: return "exclusive";
+ case TASKGRAPH_REGION_SEQUENTIAL: return "sequential";
+ case TASKGRAPH_REGION_IRREDUCIBLE: return "irreducible";
+ default: return "<unknown>";
+ }
+}
+
+#if defined(KMP_DEBUG) || defined(DEBUG_TASKGRAPH)
+static void
+__kmp_dump_taskgraph_regions(FILE *f, kmp_taskgraph_region_t *region,
+ int indent = 0) {
+ switch (region->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ fprintf(f, "%*s%s node\n", indent, "",
+ __kmp_taskgraph_region_type_name(region->type));
+ break;
+ case TASKGRAPH_REGION_NODE:
+ case TASKGRAPH_REGION_WAIT: {
+ char set_membership[40];
+ if (region->mutexset)
+ sprintf(set_membership, " [sets: 0x%llx]",
+ (unsigned long long) region->mutexset->bits[0]);
+ else
+ strcpy(set_membership, "");
+ if (region->task.node->u.resolved.count > 1)
+ fprintf(f, "%*s%s: %p (* %d)%s\n", indent, "",
+ __kmp_taskgraph_region_type_name(region->type),
+ region->task.node, region->task.node->u.resolved.count,
+ set_membership);
+ else
+ fprintf(f, "%*s%s: %p%s\n", indent, "",
+ __kmp_taskgraph_region_type_name(region->type),
+ region->task.node, set_membership);
+ break;
+ }
+ default: {
+ char set_membership[40];
+ if (region->mutexset)
+ sprintf(set_membership, " [sets: 0x%llx]",
+ (unsigned long long) region->mutexset->bits[0]);
+ else
+ strcpy(set_membership, "");
+ fprintf(f, "%*s%s%s {\n", indent, "",
+ __kmp_taskgraph_region_type_name (region->type), set_membership);
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ __kmp_dump_taskgraph_regions(f, region->inner.children[c], indent + 2);
+ }
+ fprintf(f, "%*s}\n", indent, "");
+ }
+ }
+}
+#endif
+
+#ifdef DEBUG_TASKGRAPH
+
+static kmp_taskgraph_region_dep_t *
+__kmp_dump_find_parent_regions(kmp_info *thd, kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region, int numregions,
+ kmp_taskgraph_region_dep_t *list = nullptr) {
+ for (int r = 0; r < numregions; r++) {
+ if (!region[r].parent)
+ continue;
+ bool in_list = false;
+ for (kmp_taskgraph_region_dep_t *dep = list; dep; dep = dep->next) {
+ if (dep->region == region[r].parent) {
+ in_list = true;
+ break;
+ }
+ }
+ if (!in_list) {
+ list = __kmp_region_deplist_add(thd, &taskgraph->recycled_deps,
+ region[r].parent, list);
+ list = __kmp_dump_find_parent_regions(thd, taskgraph, region[r].parent,
+ 1, list);
+ }
+ }
+ return list;
+}
+
+static void
+__kmp_dump_raw_taskgraph_regions(FILE *f, kmp_info *thd,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region,
+ int numregions, int indent = 0) {
+ kmp_taskgraph_region_dep_t *parentlist = nullptr;
+ kmp_taskgraph_region_dep_t *printedlist = nullptr;
+ for (int r = 0; r < numregions; r++) {
+ int children = 0;
+ if (region[r].type == TASKGRAPH_REGION_PARALLEL ||
+ region[r].type == TASKGRAPH_REGION_SEQUENTIAL ||
+ region[r].type == TASKGRAPH_REGION_EXCLUSIVE ||
+ region[r].type == TASKGRAPH_REGION_IRREDUCIBLE)
+ children = region[r].inner.num_children;
+ fprintf(f,
+ "%*sregion %d (%p): %s%s (%d children) parent %p succs %d preds %d\n",
+ indent, "", r, ®ion[r],
+ __kmp_taskgraph_region_type_name(region[r].type),
+ region[r].mark == TASKGRAPH_COMBINED ? " (combined)" : "",
+ children, region[r].parent,
+ __kmp_region_deplist_len(region[r].successors),
+ __kmp_region_deplist_len(region[r].predecessors));
+ if (children > 0) {
+ for (int c = 0; c < children; c++)
+ __kmp_dump_raw_taskgraph_regions(f, thd, taskgraph,
+ region->inner.children[c], 1,
+ indent + 2);
+ }
+ }
+ if (indent == 0) {
+ parentlist = __kmp_dump_find_parent_regions(thd, taskgraph, region,
+ numregions);
+ fprintf(stderr, "%*sfound %d parent region(s):\n", indent, "",
+ __kmp_region_deplist_len(parentlist));
+ for (kmp_taskgraph_region_dep_t *p = parentlist; p; p = p->next) {
+ __kmp_dump_raw_taskgraph_regions(f, thd, taskgraph, p->region, 1,
+ indent + 2);
+ }
+ __kmp_region_deplist_recycle(&taskgraph->recycled_deps, parentlist);
+ }
+}
+#endif
+
+#define NO_DEP_BARRIER (false)
+#define DEP_BARRIER (true)
+
+/// Build a nested region structure from a "raw" recorded taskgraph, and mark
+/// the taskgraph ready for replay.
+//
+// The input to this function consists of tasks with *data* dependencies
+// between them. The output of the function is a nested tree structure: the
+// dependencies between tasks implicitly become *control* dependencies. In
+// the common case, these ought to map straightforwardly to hardware-provided
+// execution primitives (e.g. on a GPU), or to runtime-provided primitives (for
+// the CPU).
+//
+// Here is an example taskgraph:
+//
+// #pragma omp taskgraph
+// {
+// #pragma omp task depend(out: deps[2])
+// { }
+// #pragma omp task depend(out: deps[0], deps[1])
+// { }
+// #pragma omp task depend(inout: deps[0])
+// { }
+// #pragma omp task depend(inout: deps[1])
+// { }
+// #pragma omp task depend(inout: deps[2])
+// { }
+// #pragma omp task depend(in: deps[0], deps[1], deps[2])
+// { }
+// }
+//
+// This dependency graph is "reducible", and the resulting tree looks like this:
+//
+// sequential {
+// parallel {
+// sequential {
+// node: 0x588aa11021b0
+// node: 0x588aa1102250
+// }
+// sequential {
+// node: 0x588aa11021d8
+// parallel {
+// node: 0x588aa1102228
+// node: 0x588aa1102200
+// }
+// }
+// }
+// node: 0x588aa1102278
+// }
+//
+// Each node represents a task, and the containing parallel and sequential
+// regions represent sub-regions that can be executed in parallel, or
+// one-at-a-time, in order.
+//
+// In some cases, the data-dependency graph may not be trivially reducible to
+// parallel and sequential regions. In this case, several techniques are used
+// to produce a reducible graph from an irreducible graph (see
+// __kmp_taskgraph_rewrite_irreducible).
+//
+// For example in this graph:
+//
+// #pragma omp taskgraph
+// {
+// #pragma omp task depend(out: deps[0], deps[1])
+// { }
+// #pragma omp task depend(out: deps[2], deps[3])
+// { }
+// #pragma omp task depend(inout: deps[0])
+// { }
+// #pragma omp task depend(inout: deps[1])
+// { }
+// #pragma omp task depend(inout: deps[2])
+// { }
+// #pragma omp task depend(inout: deps[3])
+// { }
+// #pragma omp task depend(in: deps[0], deps[1], deps[2], deps[3])
+// { }
+// #pragma omp task depend(in: deps[1], deps[2])
+// { }
+// }
+//
+// The final two tasks overlap data dependencies in such a way that the
+// resulting dependency graph cannot be trivially decomposed to parallel and
+// sequential regions. In this case, the graph is handled by duplicating task
+// nodes so they appear in more than one place in the resulting nested region
+// structure:
+//
+// parallel {
+// sequential {
+// parallel {
+// sequential {
+// node: 0x61bfca8ecfd8 (* 2)
+// node: 0x61bfca8ed050 (* 2)
+// }
+// sequential {
+// node: 0x61bfca8ed000 (* 2)
+// node: 0x61bfca8ed078 (* 2)
+// }
+// }
+// node: 0x61bfca8ed0f0
+// }
+// sequential {
+// parallel {
+// sequential {
+// node: 0x61bfca8ed000 (* 2)
+// parallel {
+// node: 0x61bfca8ed0a0
+// node: 0x61bfca8ed078 (* 2)
+// }
+// }
+// sequential {
+// node: 0x61bfca8ecfd8 (* 2)
+// parallel {
+// node: 0x61bfca8ed050 (* 2)
+// node: 0x61bfca8ed028
+// }
+// }
+// }
+// node: 0x61bfca8ed0c8
+// }
+// }
+//
+// The "(* 2)" markers show that the task node appears "instantiated" in that
+// number of places in the graph. Care must be taken at replay time that all
+// nodes preceding a multiply-instantiated node execute before the node, and
+// that all nodes succeeding each "instantiation point" are executed once the
+// task has executed.
+//
+// The final region type is "exclusive", which arises for "mutexinoutset"
+// dependencies that are able to be abstracted away (we can't do this in all
+// cases: when we can't, we still use explicit mutexes).
+//
+// An example of this:
+//
+// #pragma omp taskgraph
+// {
+// #pragma omp task depend(mutexinoutset: deps[0])
+// { }
+// #pragma omp task depend(mutexinoutset: deps[1])
+// { }
+// #pragma omp task depend(mutexinoutset: deps[0])
+// { }
+// #pragma omp task depend(mutexinoutset: deps[1])
+// { }
+// #pragma omp task depend(mutexinoutset: deps[0])
+// { }
+// #pragma omp task depend(mutexinoutset: deps[1])
+// { }
+// #pragma omp task depend(mutexinoutset: deps[0])
+// { }
+// #pragma omp task depend(mutexinoutset: deps[1])
+// { }
+// }
+//
+// Results in this structure:
+//
+// parallel {
+// exclusive {
+// node: 0x5c0c5c571120
+// node: 0x5c0c5c5710d0
+// node: 0x5c0c5c571080
+// node: 0x5c0c5c571030
+// }
+// exclusive {
+// node: 0x5c0c5c5710f8
+// node: 0x5c0c5c5710a8
+// node: 0x5c0c5c571058
+// node: 0x5c0c5c571008
+// }
+// }
+//
+// The meaning of "exclusive" here is for each of the child regions (task
+// nodes in this case) to be executed in some unspecified order, one at a
+// time relative to the other regions in the structure. E.g. a GPU
+// implementation could try to dynamically schedule tasks such that they fit
+// instantaneously-available execution resources.
+//
+// In cases where mutexes cannot be abstracted, each affected task node is
+// annotated with a set of mutexes that must be held while executing the task.
+// (Shown with [sets: 0xN] in dump output).
+
+kmp_int32
+__kmp_build_taskgraph(kmp_int32 gtid, kmp_taskdata_t *current_taskdata,
+ kmp_taskgraph_record_t *taskgraph) {
+ kmp_int32 numnodes = taskgraph->num_tasks;
+ kmp_int32 numregions = numnodes + 2;
+ kmp_taskgraph_node_t *nodes = taskgraph->record_map;
+ kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_dephash_t *hash = __kmp_dephash_create(thread, current_taskdata);
+ bool dep_barrier = false;
+
+ kmp_depnode_t *all_depnodes =
+ (kmp_depnode_t *)__kmp_thread_malloc(thread,
+ numregions * sizeof(kmp_depnode_t));
+
+ kmp_int32 next_mutex_set = 0;
+
+ for (kmp_int32 i = 0; i < numnodes; i++) {
+ int n_mtxs = 0;
+ bool dep_all;
+
+ dep_all = __kmp_filter_aliased_deps(nodes[i].u.unresolved.ndeps,
+ nodes[i].u.unresolved.dep_list,
+ nodes[i].task, &n_mtxs);
+ kmp_depnode_t *node = &all_depnodes[i];
+ __kmp_init_node(node, /*on_stack=*/false);
+ node->dn.task = nodes[i].task;
+ dep_barrier = !nodes[i].task && nodes[i].taskloop_task;
+ if (!dep_all) {
+ __kmp_process_deps<taskgraph_deps>(gtid, node, &hash, dep_barrier,
+ nodes[i].u.unresolved.ndeps,
+ nodes[i].u.unresolved.dep_list,
+ nodes[i].task, next_mutex_set);
+ } else {
+ __kmp_process_dep_all<taskgraph_deps>(gtid, node, hash, dep_barrier,
+ nodes[i].task);
+ }
+ }
+
+ kmp_taskgraph_region_t *order_out[numregions];
+ kmp_int32 outidx = 0;
+
+ kmp_taskgraph_region_t *initial_regions =
+ (kmp_taskgraph_region_t *)__kmp_fast_allocate(thread,
+ sizeof(kmp_taskgraph_region_t) * numregions);
+ // FIXME: Something like 'placement new' here?
+ memset(initial_regions, 0, sizeof(kmp_taskgraph_region_t) * numregions);
+
+ kmp_taskgraph_region_t *cfg_barrier = nullptr;
+
+ for (kmp_int32 i = 0; i < numnodes; i++) {
+ initial_regions[i].type = nodes[i].task ? TASKGRAPH_REGION_NODE
+ : TASKGRAPH_REGION_WAIT;
+ initial_regions[i].task.node = &nodes[i];
+ initial_regions[i].task.next_instance = &initial_regions[i];
+ initial_regions[i].parent = nullptr;
+ if (i < numnodes - 1) {
+ initial_regions[i].next = &initial_regions[i + 1];
+ } else {
+ initial_regions[i].next = nullptr;
+ }
+ kmp_depnode_t *depnode = &all_depnodes[i];
+ initial_regions[i].mutexset = depnode->dn.set_membership;
+ for (kmp_depnode_list_t *succ = depnode->dn.successors;
+ succ;
+ succ = succ->next) {
+ kmp_int32 succ_idx = succ->node - all_depnodes;
+ kmp_taskgraph_region_t *tg_succ = &initial_regions[succ_idx];
+ tg_succ->predecessors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ &initial_regions[i], tg_succ->predecessors);
+ initial_regions[i].successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps, tg_succ,
+ initial_regions[i].successors);
+ }
+ // Handle control flow dependencies. If a node (e.g. a taskloop task) has
+ // a wait after it corresponding to the end of an implicit taskgroup, join
+ // the task to the wait. The wait then becomes a barrier; any tasks after
+ // it will depend on the barrier.
+ if (nodes[i].u.unresolved.cfg_successor != -1) {
+ kmp_int32 cfg_succ = nodes[i].u.unresolved.cfg_successor;
+ initial_regions[i].successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ &initial_regions[cfg_succ],
+ initial_regions[i].successors);
+ initial_regions[cfg_succ].predecessors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ &initial_regions[i],
+ initial_regions[cfg_succ].predecessors);
+ }
+ if (nodes[i].taskloop_task && !nodes[i].task) {
+ cfg_barrier = &initial_regions[i];
+ } else if (cfg_barrier) {
+ cfg_barrier->successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ &initial_regions[i], cfg_barrier->successors);
+ initial_regions[i].predecessors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ cfg_barrier, initial_regions[i].predecessors);
+ }
+ }
+
+ __kmp_dephash_free<false>(thread, hash);
+ __kmp_thread_free(thread, all_depnodes);
+
+ // We're done with the "unresolved" data now. Initialise node count.
+ for (kmp_int32 i = 0; i < numnodes; i++) {
+ __kmp_thread_free(thread, nodes[i].u.unresolved.dep_list);
+ nodes[i].u.resolved.last_region = nullptr;
+ nodes[i].u.resolved.count = 0;
+ }
+
+ // Use these indices for the virtual entry and exit regions
+ kmp_int32 entryregion = numnodes, exitregion = numnodes + 1;
+
+ // Set entry/exit node types, and add to worklist
+ initial_regions[entryregion].type = TASKGRAPH_REGION_ENTRY;
+ initial_regions[entryregion].next = &initial_regions[0];
+ initial_regions[exitregion].type = TASKGRAPH_REGION_EXIT;
+ initial_regions[numnodes - 1].next = &initial_regions[exitregion];
+
+ // Join entry and exit nodes up to the graph
+ for (kmp_int32 i = 0; i < numnodes; i++) {
+ kmp_taskgraph_region_t *region = &initial_regions[i];
+ kmp_int32 npreds = __kmp_region_deplist_len(region->predecessors);
+ kmp_int32 nsuccs = __kmp_region_deplist_len(region->successors);
+ if (npreds == 0) {
+ initial_regions[entryregion].successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps, region,
+ initial_regions[entryregion].successors);
+ region->predecessors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ &initial_regions[entryregion],
+ region->predecessors);
+ }
+ if (nsuccs == 0) {
+ initial_regions[exitregion].predecessors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps, region,
+ initial_regions[exitregion].predecessors);
+ region->successors =
+ __kmp_region_deplist_add(thread, &taskgraph->recycled_deps,
+ &initial_regions[exitregion],
+ region->successors);
+ }
+ region->owner = taskgraph;
+ }
+
+ kmp_int32 max_level = -1;
+
+ for (kmp_int32 i = 0; i < numregions; i++)
+ initial_regions[i].timestamp = i;
+
+ for (kmp_int32 i = 0; i < numregions; i++) {
+ if (initial_regions[i].mark == TASKGRAPH_UNMARKED) {
+ kmp_int32 level =
+ __kmp_taskgraph_topological_order(&initial_regions[i], order_out,
+ &outidx);
+ max_level = level > max_level ? level : max_level;
+ }
+ }
+
+ assert(outidx == numregions);
+
+#ifdef DEBUG_TASKGRAPH
+ fprintf(stderr, "topological order (max level: %d):\n", max_level);
+
+ for (kmp_int32 i = 0; i < outidx; i++) {
+ fprintf(stderr, "node %d (region %p), level %d\n", order_out[i]->timestamp,
+ order_out[i], order_out[i]->level);
+ }
+#endif
+
+ kmp_taskgraph_region_t **alloc_chain = &initial_regions[0].alloc_chain;
+
+ kmp_taskgraph_region_t *root_region =
+ __kmp_taskgraph_build_regions(thread, taskgraph, alloc_chain,
+ &initial_regions[entryregion],
+ &initial_regions[exitregion]);
+
+ __kmp_taskgraph_count_nodes(root_region);
+
+ __kmp_taskgraph_exclusive_regions(thread, taskgraph, alloc_chain,
+ &root_region, next_mutex_set);
+
+ *alloc_chain = nullptr;
+
+ taskgraph->root = root_region;
+ taskgraph->alloc_root = initial_regions[0].alloc_chain;
+
+ // Free dependency lists and deleted regions.
+ kmp_taskgraph_region_t **regp = &taskgraph->alloc_root;
+ while (*regp) {
+ kmp_taskgraph_region_t *reg = *regp;
+ __kmp_region_deplist_free(thread, reg->predecessors);
+ __kmp_region_deplist_free(thread, reg->successors);
+ reg->predecessors = nullptr;
+ reg->successors = nullptr;
+ if (reg->mark == TASKGRAPH_DELETED) {
+ kmp_taskgraph_region_t *chain_next = reg->alloc_chain;
+ TGDBG("deleted region from alloc chain: %p\n", reg);
+ __kmp_fast_free(thread, reg);
+ *regp = chain_next;
+ } else {
+ regp = ®->alloc_chain;
+ }
+ }
+ // Free recycled dep list. We could pass this along to the next invocation
+ // of this function instead, but we don't do that yet (ownership/thread
+ // safety needs careful consideration if we do that).
+ for (kmp_taskgraph_region_dep_t *dep = taskgraph->recycled_deps; dep;) {
+ kmp_taskgraph_region_dep_t *next = dep->next;
+ TGDBG("free dep from recycled list\n");
+ __kmp_fast_free(thread, dep);
+ dep = next;
+ }
+ taskgraph->recycled_deps = nullptr;
+
+ KG_TRACE(10, ("Processed taskgraph %p (graph_id %" PRIx64 "):\n", taskgraph,
+ taskgraph->graph_id));
+ KG_DUMP(10, __kmp_dump_taskgraph_regions(stderr, root_region));
+
+ #ifdef DEBUG_TASKGRAPH
+ //__kmp_dump_taskgraph_regions(stderr, root_region);
+ //__kmp_dump_raw_taskgraph_regions(stderr, thread, taskgraph,
+ // &initial_regions[0], numregions);
+ #endif
+
+ KMP_ATOMIC_ST_REL(&taskgraph->status, KMP_TDG_READY);
+
+ return 0;
+}
+
+// returns true if the task has any outstanding dependence
+static bool __kmp_check_deps(kmp_int32 gtid, kmp_depnode_t *node,
+ kmp_task_t *task, kmp_dephash_t **hash,
+ bool dep_barrier, kmp_int32 ndeps,
+ kmp_depend_info_t *dep_list,
+ kmp_int32 ndeps_noalias,
+ kmp_depend_info_t *noalias_dep_list) {
+ int n_mtxs = 0, dep_all = 0;
+#if KMP_DEBUG
+ kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
+#endif
+ KA_TRACE(20, ("__kmp_check_deps: T#%d checking dependences for task %p : %d "
+ "possibly aliased dependences, %d non-aliased dependences : "
+ "dep_barrier=%d .\n",
+ gtid, taskdata, ndeps, ndeps_noalias, dep_barrier));
+
+ dep_all = __kmp_filter_aliased_deps(ndeps, dep_list, task, &n_mtxs);
+
+ // doesn't need to be atomic as no other thread is going to be accessing this
+ // node just yet.
+ // npredecessors is set -1 to ensure that none of the releasing tasks queues
+ // this task before we have finished processing all the dependences
+ node->dn.npredecessors = -1;
+
+ // used to pack all npredecessors additions into a single atomic operation at
+ // the end
+ int npredecessors;
+ kmp_int32 next_mutex = 0;
+
+ if (!dep_all) { // regular dependences
+ npredecessors =
+ __kmp_process_deps<normal_deps>(gtid, node, hash, dep_barrier,
+ ndeps, dep_list, task, next_mutex);
+ npredecessors +=
+ __kmp_process_deps<normal_deps>(gtid, node, hash, dep_barrier,
+ ndeps_noalias, noalias_dep_list, task,
+ next_mutex, false);
+ } else { // omp_all_memory dependence
+ npredecessors =
+ __kmp_process_dep_all<normal_deps>(gtid, node, *hash, dep_barrier, task);
+ }
+
+ node->dn.task = task;
+ KMP_MB();
+
+ // Account for our initial fake value
+ npredecessors++;
+
+ // Update predecessors and obtain current value to check if there are still
+ // any outstanding dependences (some tasks may have finished while we
+ // processed the dependences)
+ npredecessors =
+ node->dn.npredecessors.fetch_add(npredecessors) + npredecessors;
+
+ KA_TRACE(20, ("__kmp_check_deps: T#%d found %d predecessors for task %p \n",
+ gtid, npredecessors, taskdata));
+
+ // beyond this point the task could be queued (and executed) by a releasing
+ // task...
+ return npredecessors > 0 ? true : false;
+}
+
+/*!
+ at ingroup TASKING
+ at param loc_ref location of the original task directive
+ at param gtid Global Thread ID of encountering thread
+ at param new_task task thunk allocated by __kmp_omp_task_alloc() for the ''new
+task''
+ at param ndeps Number of depend items with possible aliasing
+ at param dep_list List of depend items with possible aliasing
+ at param ndeps_noalias Number of depend items with no aliasing
+ at param noalias_dep_list List of depend items with no aliasing
+
+ at return Returns either TASK_CURRENT_NOT_QUEUED if the current task was not
+suspended and queued, or TASK_CURRENT_QUEUED if it was suspended and queued
+
+Schedule a non-thread-switchable task with dependences for execution
+*/
+kmp_int32 __kmpc_omp_task_with_deps(ident_t *loc_ref, kmp_int32 gtid,
+ kmp_task_t *new_task, kmp_int32 ndeps,
+ kmp_depend_info_t *dep_list,
+ kmp_int32 ndeps_noalias,
+ kmp_depend_info_t *noalias_dep_list) {
+
+ kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
+ KA_TRACE(10, ("__kmpc_omp_task_with_deps(enter): T#%d loc=%p task=%p\n", gtid,
+ loc_ref, new_taskdata));
+ __kmp_assert_valid_gtid(gtid);
+ kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_taskdata_t *current_task = thread->th.th_current_task;
+
#if OMPT_SUPPORT
if (ompt_enabled.enabled) {
if (!current_task->ompt_task_info.frame.enter_frame.ptr)
diff --git a/openmp/runtime/src/kmp_taskdeps.h b/openmp/runtime/src/kmp_taskdeps.h
index 0792baf67f162..e7df68c3f6147 100644
--- a/openmp/runtime/src/kmp_taskdeps.h
+++ b/openmp/runtime/src/kmp_taskdeps.h
@@ -40,6 +40,7 @@ static inline void __kmp_node_deref(kmp_info_t *thread, kmp_depnode_t *node) {
}
}
+template <bool refcounting>
static inline void __kmp_depnode_list_free(kmp_info_t *thread,
kmp_depnode_list *list) {
kmp_depnode_list *next;
@@ -47,7 +48,8 @@ static inline void __kmp_depnode_list_free(kmp_info_t *thread,
for (; list; list = next) {
next = list->next;
- __kmp_node_deref(thread, list->node);
+ if (refcounting)
+ __kmp_node_deref(thread, list->node);
#if USE_FAST_MEMORY
__kmp_fast_free(thread, list);
#else
@@ -56,6 +58,7 @@ static inline void __kmp_depnode_list_free(kmp_info_t *thread,
}
}
+template <bool refcounting>
static inline void __kmp_dephash_free_entries(kmp_info_t *thread,
kmp_dephash_t *h) {
for (size_t i = 0; i < h->size; i++) {
@@ -63,12 +66,14 @@ static inline void __kmp_dephash_free_entries(kmp_info_t *thread,
kmp_dephash_entry_t *next;
for (kmp_dephash_entry_t *entry = h->buckets[i]; entry; entry = next) {
next = entry->next_in_bucket;
- __kmp_depnode_list_free(thread, entry->last_set);
- __kmp_depnode_list_free(thread, entry->prev_set);
- __kmp_node_deref(thread, entry->last_out);
- if (entry->mtx_lock) {
- __kmp_destroy_lock(entry->mtx_lock);
- __kmp_free(entry->mtx_lock);
+ __kmp_depnode_list_free<refcounting>(thread, entry->last_set);
+ __kmp_depnode_list_free<refcounting>(thread, entry->prev_set);
+ if (refcounting) {
+ __kmp_node_deref(thread, entry->last_out);
+ if (entry->mtx_lock) {
+ __kmp_destroy_lock(entry->mtx_lock);
+ __kmp_free(entry->mtx_lock);
+ }
}
#if USE_FAST_MEMORY
__kmp_fast_free(thread, entry);
@@ -79,12 +84,14 @@ static inline void __kmp_dephash_free_entries(kmp_info_t *thread,
h->buckets[i] = 0;
}
}
- __kmp_node_deref(thread, h->last_all);
+ if (refcounting)
+ __kmp_node_deref(thread, h->last_all);
h->last_all = NULL;
}
+template <bool refcounting>
static inline void __kmp_dephash_free(kmp_info_t *thread, kmp_dephash_t *h) {
- __kmp_dephash_free_entries(thread, h);
+ __kmp_dephash_free_entries<refcounting>(thread, h);
#if USE_FAST_MEMORY
__kmp_fast_free(thread, h);
#else
@@ -95,23 +102,6 @@ static inline void __kmp_dephash_free(kmp_info_t *thread, kmp_dephash_t *h) {
extern void __kmpc_give_task(kmp_task_t *ptask, kmp_int32 start);
static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {
-
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (task->is_taskgraph && !(__kmp_tdg_is_recording(task->tdg->tdg_status))) {
- kmp_node_info_t *TaskInfo = &(task->tdg->record_map[task->td_tdg_task_id]);
-
- for (int i = 0; i < TaskInfo->nsuccessors; i++) {
- kmp_int32 successorNumber = TaskInfo->successors[i];
- kmp_node_info_t *successor = &(task->tdg->record_map[successorNumber]);
- kmp_int32 npredecessors = KMP_ATOMIC_DEC(&successor->npredecessors_counter) - 1;
- if (successor->task != nullptr && npredecessors == 0) {
- __kmp_omp_task(gtid, successor->task, false);
- }
- }
- return;
- }
-#endif
-
kmp_info_t *thread = __kmp_threads[gtid];
kmp_depnode_t *node = task->td_depnode;
@@ -129,7 +119,7 @@ static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {
KA_TRACE(
40, ("__kmp_release_deps: T#%d freeing dependencies hash of task %p.\n",
gtid, task));
- __kmp_dephash_free(thread, task->td_dephash);
+ __kmp_dephash_free<true>(thread, task->td_dephash);
task->td_dephash = NULL;
}
@@ -140,10 +130,6 @@ static inline void __kmp_release_deps(kmp_int32 gtid, kmp_taskdata_t *task) {
gtid, task));
KMP_ACQUIRE_DEPNODE(gtid, node);
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (!task->is_taskgraph ||
- (task->is_taskgraph && !__kmp_tdg_is_recording(task->tdg->tdg_status)))
-#endif
node->dn.task =
NULL; // mark this task as finished, so no new dependencies are generated
KMP_RELEASE_DEPNODE(gtid, node);
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index 71d78413f356a..962609e53a319 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -17,6 +17,8 @@
#include "kmp_wait_release.h"
#include "kmp_taskdeps.h"
+#undef DEBUG_TASKGRAPH
+
#if OMPT_SUPPORT
#include "ompt-specific.h"
#endif
@@ -37,10 +39,6 @@ static void __kmp_alloc_task_deque(kmp_info_t *thread,
static int __kmp_realloc_task_threads_data(kmp_info_t *thread,
kmp_task_team_t *task_team);
static void __kmp_bottom_half_finish_proxy(kmp_int32 gtid, kmp_task_t *ptask);
-#if OMP_TASKGRAPH_EXPERIMENTAL
-static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id);
-int __kmp_taskloop_task(int gtid, void *ptask);
-#endif
// returns 1 if new task is allowed to execute, 0 otherwise
// checks Task Scheduling constraint (if requested) and
@@ -70,11 +68,7 @@ static bool __kmp_task_is_allowed(int gtid, const kmp_int32 is_constrained,
}
// Check mutexinoutset dependencies, acquire locks
kmp_depnode_t *node = tasknew->td_depnode;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (!tasknew->is_taskgraph && UNLIKELY(node && (node->dn.mtx_num_locks > 0))) {
-#else
if (UNLIKELY(node && (node->dn.mtx_num_locks > 0))) {
-#endif
for (int i = 0; i < node->dn.mtx_num_locks; ++i) {
KMP_DEBUG_ASSERT(node->dn.mtx_locks[i] != NULL);
if (__kmp_test_lock(node->dn.mtx_locks[i], gtid))
@@ -665,33 +659,12 @@ static void __kmp_free_task(kmp_int32 gtid, kmp_taskdata_t *taskdata,
task->data2.priority = 0;
taskdata->td_flags.freed = 1;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- // do not free tasks in taskgraph
- if (!taskdata->is_taskgraph) {
-#endif
// deallocate the taskdata and shared variable blocks associated with this task
#if USE_FAST_MEMORY
__kmp_fast_free(thread, taskdata);
#else /* ! USE_FAST_MEMORY */
__kmp_thread_free(thread, taskdata);
#endif
-#if OMP_TASKGRAPH_EXPERIMENTAL
- } else {
- taskdata->td_flags.complete = 0;
- taskdata->td_flags.started = 0;
- taskdata->td_flags.freed = 0;
- taskdata->td_flags.executing = 0;
- taskdata->td_flags.task_serial =
- (taskdata->td_parent->td_flags.final ||
- taskdata->td_flags.team_serial || taskdata->td_flags.tasking_ser);
-
- // taskdata->td_allow_completion_event.pending_events_count = 1;
- KMP_ATOMIC_ST_RLX(&taskdata->td_untied_count, 0);
- KMP_ATOMIC_ST_RLX(&taskdata->td_incomplete_child_tasks, 0);
- // start at one because counts current task and children
- KMP_ATOMIC_ST_RLX(&taskdata->td_allocated_child_tasks, 1);
- }
-#endif
KA_TRACE(20, ("__kmp_free_task: T#%d freed task %p\n", gtid, taskdata));
}
@@ -747,7 +720,7 @@ static void __kmp_free_task_and_ancestors(kmp_int32 gtid,
"dephash of implicit task %p\n",
gtid, taskdata));
// cleanup dephash of finished implicit task
- __kmp_dephash_free_entries(thread, taskdata->td_dephash);
+ __kmp_dephash_free_entries<true>(thread, taskdata->td_dephash);
}
}
}
@@ -779,13 +752,13 @@ static bool __kmp_track_children_task(kmp_taskdata_t *taskdata) {
flags.detachable == TASK_DETACHABLE || flags.hidden_helper;
ret = ret ||
KMP_ATOMIC_LD_ACQ(&taskdata->td_parent->td_incomplete_child_tasks) > 0;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (taskdata->td_taskgroup && taskdata->is_taskgraph)
- ret = ret || KMP_ATOMIC_LD_ACQ(&taskdata->td_taskgroup->count) > 0;
-#endif
return ret;
}
+static bool __kmp_taskgraph_exec_descr_finish(kmp_int32 gtid,
+ kmp_info_t *thread,
+ kmp_taskgraph_exec_descr_t *descr);
+
// __kmp_task_finish: bookkeeping to do when a task finishes execution
//
// gtid: global thread ID for calling thread
@@ -802,10 +775,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
kmp_info_t *thread = __kmp_threads[gtid];
kmp_task_team_t *task_team =
thread->th.th_task_team; // might be NULL for serial teams...
-#if OMP_TASKGRAPH_EXPERIMENTAL
- // to avoid seg fault when we need to access taskdata->td_flags after free when using vanilla taskloop
bool is_taskgraph;
-#endif
#if KMP_DEBUG
kmp_int32 children = 0;
#endif
@@ -815,9 +785,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
KMP_DEBUG_ASSERT(taskdata->td_flags.tasktype == TASK_EXPLICIT);
-#if OMP_TASKGRAPH_EXPERIMENTAL
- is_taskgraph = taskdata->is_taskgraph;
-#endif
+ is_taskgraph = taskdata->owning_taskgraph;
if (UNLIKELY(taskdata->td_flags.tiedness == TASK_UNTIED)) {
// untied task needs to check the counter so that the task structure is not
@@ -923,15 +891,23 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
if (completed) {
taskdata->td_flags.complete = 1; // mark the task as completed
-#if OMP_TASKGRAPH_EXPERIMENTAL
- taskdata->td_flags.onced = 1; // mark the task as ran once already
-#endif
#if OMPT_SUPPORT
// This is not a detached task, we are done here
if (ompt)
__ompt_task_finish(task, resumed_task, ompt_task_complete);
#endif
+
+ if (is_taskgraph) {
+ __kmp_taskgraph_exec_descr_finish(gtid, thread, taskdata->exec_descr);
+ KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks) - 1;
+ if (taskdata->td_taskgroup)
+ KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
+ thread->th.th_current_task = resumed_task;
+ resumed_task->td_flags.executing = 1; // resume previous task
+ return;
+ }
+
// TODO: What would be the balance between the conditions in the function
// and an atomic operation?
if (__kmp_track_children_task(taskdata)) {
@@ -942,11 +918,7 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
#endif
KMP_ATOMIC_DEC(&taskdata->td_parent->td_incomplete_child_tasks);
KMP_DEBUG_ASSERT(children >= 0);
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (taskdata->td_taskgroup && !taskdata->is_taskgraph)
-#else
if (taskdata->td_taskgroup)
-#endif
KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
} else if (task_team && (task_team->tt.tt_found_proxy_tasks ||
task_team->tt.tt_hidden_helper_task_encountered)) {
@@ -985,19 +957,6 @@ static void __kmp_task_finish(kmp_int32 gtid, kmp_task_t *task,
// KMP_DEBUG_ASSERT( resumed_task->td_flags.executing == 0 );
resumed_task->td_flags.executing = 1; // resume previous task
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (is_taskgraph && __kmp_track_children_task(taskdata) &&
- taskdata->td_taskgroup) {
- // TDG: we only release taskgroup barrier here because
- // free_task_and_ancestors will call
- // __kmp_free_task, which resets all task parameters such as
- // taskdata->started, etc. If we release the barrier earlier, these
- // parameters could be read before being reset. This is not an issue for
- // non-TDG implementation because we never reuse a task(data) structure
- KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
- }
-#endif
-
KA_TRACE(
10, ("__kmp_task_finish(exit): T#%d finished task %p, resuming task %p\n",
gtid, taskdata, resumed_task));
@@ -1113,9 +1072,6 @@ void __kmp_init_implicit_task(ident_t *loc_ref, kmp_info_t *this_thr,
task->td_flags.executing = 1;
task->td_flags.complete = 0;
task->td_flags.freed = 0;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- task->td_flags.onced = 0;
-#endif
task->td_depnode = NULL;
task->td_last_tied = task;
@@ -1159,9 +1115,6 @@ void __kmp_finish_implicit_task(kmp_info_t *thread) {
if (task->td_dephash) {
int children;
task->td_flags.complete = 1;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- task->td_flags.onced = 1;
-#endif
children = KMP_ATOMIC_LD_ACQ(&task->td_incomplete_child_tasks);
kmp_tasking_flags_t flags_old = task->td_flags;
if (children == 0 && flags_old.complete == 1) {
@@ -1173,7 +1126,7 @@ void __kmp_finish_implicit_task(kmp_info_t *thread) {
KA_TRACE(100, ("__kmp_finish_implicit_task: T#%d cleans "
"dephash of implicit task %p\n",
thread->th.th_info.ds.ds_gtid, task));
- __kmp_dephash_free_entries(thread, task->td_dephash);
+ __kmp_dephash_free_entries<true>(thread, task->td_dephash);
}
}
}
@@ -1186,7 +1139,7 @@ void __kmp_finish_implicit_task(kmp_info_t *thread) {
void __kmp_free_implicit_task(kmp_info_t *thread) {
kmp_taskdata_t *task = thread->th.th_current_task;
if (task && task->td_dephash) {
- __kmp_dephash_free(thread, task->td_dephash);
+ __kmp_dephash_free<true>(thread, task->td_dephash);
task->td_dephash = NULL;
}
}
@@ -1201,7 +1154,7 @@ static size_t __kmp_round_up_to_val(size_t size, size_t val) {
}
}
return size;
-} // __kmp_round_up_to_va
+} // __kmp_round_up_to_val
// __kmp_task_alloc: Allocate the taskdata and task data structures for a task
//
@@ -1391,9 +1344,8 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
taskdata->td_flags.complete = 0;
taskdata->td_flags.freed = 0;
#if OMP_TASKGRAPH_EXPERIMENTAL
- taskdata->td_flags.onced = 0;
- taskdata->is_taskgraph = 0;
- taskdata->tdg = nullptr;
+ taskdata->owning_taskgraph = nullptr;
+ taskdata->exec_descr = nullptr;
#endif
KMP_ATOMIC_ST_RLX(&taskdata->td_incomplete_child_tasks, 0);
// start at one because counts current task and children
@@ -1430,16 +1382,6 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
}
}
-#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_info_t *tdg = __kmp_curr_tdg;
- if (tdg && __kmp_tdg_is_recording(tdg->tdg_status) &&
- (task_entry != (kmp_routine_entry_t)__kmp_taskloop_task)) {
- taskdata->is_taskgraph = 1;
- taskdata->tdg = tdg;
- taskdata->td_task_id = KMP_GEN_TASK_ID();
- taskdata->td_tdg_task_id = KMP_ATOMIC_INC(&tdg->tdg_task_id_next);
- }
-#endif
KA_TRACE(20, ("__kmp_task_alloc(exit): T#%d created task %p parent=%p\n",
gtid, taskdata, taskdata->td_parent));
@@ -1807,50 +1749,6 @@ kmp_int32 __kmp_omp_task(kmp_int32 gtid, kmp_task_t *new_task,
bool serialize_immediate) {
kmp_taskdata_t *new_taskdata = KMP_TASK_TO_TASKDATA(new_task);
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (new_taskdata->is_taskgraph &&
- __kmp_tdg_is_recording(new_taskdata->tdg->tdg_status)) {
- kmp_tdg_info_t *tdg = new_taskdata->tdg;
- // extend the record_map if needed
- if (new_taskdata->td_tdg_task_id >= tdg->map_size ||
- tdg->record_map[new_taskdata->td_tdg_task_id].task == nullptr) {
- __kmp_acquire_bootstrap_lock(&tdg->graph_lock);
- // map_size could have been updated by another thread if recursive
- // taskloop
- if (new_taskdata->td_tdg_task_id >= tdg->map_size) {
- kmp_uint old_size = tdg->map_size;
- kmp_uint new_size = old_size * 2;
- kmp_node_info_t *old_record = tdg->record_map;
- kmp_node_info_t *new_record = (kmp_node_info_t *)__kmp_allocate(
- new_size * sizeof(kmp_node_info_t));
-
- KMP_MEMCPY(new_record, old_record, old_size * sizeof(kmp_node_info_t));
- tdg->record_map = new_record;
-
- __kmp_free(old_record);
-
- for (kmp_int i = old_size; i < new_size; i++) {
- new_record[i].task = nullptr;
- new_record[i].parent_task = nullptr;
- new_record[i].successors = nullptr;
- new_record[i].nsuccessors = 0;
- new_record[i].npredecessors = 0;
- new_record[i].successors_size = 0;
- KMP_ATOMIC_ST_REL(&new_record[i].npredecessors_counter, 0);
- }
- // update the size at the end, so that we avoid other
- // threads use old_record while map_size is already updated
- tdg->map_size = new_size;
- }
- tdg->record_map[new_taskdata->td_tdg_task_id].task = new_task;
- tdg->record_map[new_taskdata->td_tdg_task_id].parent_task =
- new_taskdata->td_parent;
- KMP_ATOMIC_INC(&tdg->num_tasks);
- __kmp_release_bootstrap_lock(&tdg->graph_lock);
- }
- }
-#endif
-
/* Should we execute the new task or queue it? For now, let's just always try
to queue it. If the queue fills up, then we'll execute it. */
if (new_taskdata->td_flags.proxy == TASK_PROXY ||
@@ -2205,6 +2103,426 @@ kmp_int32 __kmpc_omp_taskyield(ident_t *loc_ref, kmp_int32 gtid, int end_part) {
return TASK_CURRENT_NOT_QUEUED;
}
+static kmp_taskgraph_exec_descr_t *
+__kmp_fill_exec_descr(kmp_int32, kmp_info_t *, kmp_taskgraph_record_t *,
+ kmp_taskgraph_region_t *, kmp_taskdata_t *,
+ kmp_taskgraph_exec_descr_t *, kmp_size_t &,
+ kmp_taskgraph_exec_descr_t **);
+
+static kmp_int32
+__kmp_pred_list_length(kmp_taskgraph_exec_descr_t *desc) {
+ kmp_int32 res = 0;
+ for (; desc; desc = desc->predecessor_chain)
+ ++res;
+ return res;
+}
+
+static kmp_taskgraph_exec_descr_t *
+__kmp_fill_sequential_descr(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region,
+ kmp_taskdata_t *parent_taskdata,
+ kmp_taskgraph_exec_descr_t *exec_descrs,
+ kmp_size_t &next_idx,
+ kmp_taskgraph_exec_descr_t **succs_to_fill_p) {
+ assert(region->type == TASKGRAPH_REGION_SEQUENTIAL);
+ kmp_taskgraph_exec_descr_t *first_node = nullptr;
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ kmp_taskgraph_exec_descr *descr =
+ __kmp_fill_exec_descr(gtid, thread, taskgraph, region->inner.children[c],
+ parent_taskdata, exec_descrs, next_idx,
+ succs_to_fill_p);
+ if (!first_node)
+ first_node = descr;
+ }
+ return first_node;
+}
+
+static kmp_taskgraph_exec_descr_t *
+__kmp_fill_par_or_excl_descr(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region,
+ kmp_taskdata_t *parent_taskdata,
+ kmp_taskgraph_exec_descr_t *exec_descrs,
+ kmp_size_t &next_idx,
+ kmp_taskgraph_exec_descr_t **succs_to_fill_p) {
+ assert(region->type == TASKGRAPH_REGION_PARALLEL ||
+ region->type == TASKGRAPH_REGION_EXCLUSIVE);
+
+ kmp_taskgraph_exec_descr *incoming_preds = *succs_to_fill_p;
+
+ kmp_taskgraph_exec_descr *exec_descr = &exec_descrs[next_idx++];
+ exec_descr->region = region;
+ exec_descr->region->exec_descr = exec_descr;
+ exec_descr->nblocks = 0;
+ exec_descr->npredecessors = __kmp_pred_list_length(incoming_preds);
+ exec_descr->predecessor_chain = nullptr;
+ exec_descr->successor = nullptr;
+ exec_descr->sibling = exec_descr;
+ exec_descr->next_instance = nullptr;
+
+ kmp_taskgraph_exec_descr_t *gathered_succs = nullptr;
+ kmp_taskgraph_exec_descr_t **gathered_succs_p = &gathered_succs;
+
+ kmp_taskgraph_exec_descr_t *sibling_list = nullptr;
+
+ for (kmp_int32 c = 0; c < region->inner.num_children; c++) {
+ kmp_taskgraph_exec_descr_t *succs_to_fill = nullptr;
+ kmp_taskgraph_exec_descr_t *head =
+ __kmp_fill_exec_descr(gtid, thread, taskgraph, region->inner.children[c],
+ parent_taskdata, exec_descrs, next_idx,
+ &succs_to_fill);
+ if (!sibling_list) {
+ sibling_list = head;
+ sibling_list->sibling = head;
+ } else {
+ kmp_taskgraph_exec_descr_t *next_sibling = sibling_list->sibling;
+ sibling_list->sibling = head;
+ head->sibling = next_sibling;
+ // Make the head of the sibling list the most recently added node (it
+ // doesn't really matter).
+ sibling_list = head;
+ }
+ while (succs_to_fill) {
+ kmp_taskgraph_exec_descr_t *next = succs_to_fill->predecessor_chain;
+ *gathered_succs_p = succs_to_fill;
+ gathered_succs_p = &succs_to_fill->predecessor_chain;
+ succs_to_fill = next;
+ }
+ }
+
+ // The parallel exec descr points to (any of the members of) the following
+ // circular sibling list.
+ exec_descr->successor = sibling_list;
+
+ // All the incoming successors point to the 'parallel' exec descr.
+ for (; incoming_preds; incoming_preds = incoming_preds->predecessor_chain) {
+ incoming_preds->successor = exec_descr;
+ }
+
+ *succs_to_fill_p = gathered_succs;
+
+ return exec_descr;
+}
+
+static kmp_taskgraph_exec_descr_t *
+__kmp_fill_exec_descr(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph,
+ kmp_taskgraph_region_t *region,
+ kmp_taskdata_t *parent_taskdata,
+ kmp_taskgraph_exec_descr_t *exec_descrs,
+ kmp_size_t &next_idx,
+ kmp_taskgraph_exec_descr_t **succs_to_fill_p) {
+ switch (region->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ break;
+ case TASKGRAPH_REGION_NODE:
+ case TASKGRAPH_REGION_WAIT: {
+ kmp_taskgraph_exec_descr_t *incoming_succs_to_fill = *succs_to_fill_p;
+ kmp_taskgraph_exec_descr_t *exec_descr = &exec_descrs[next_idx++];
+ exec_descr->region = region;
+ exec_descr->region->exec_descr = exec_descr;
+ exec_descr->nblocks = region->task.node->u.resolved.count - 1;
+ exec_descr->npredecessors = __kmp_pred_list_length(incoming_succs_to_fill);
+ exec_descr->sibling = exec_descr;
+ exec_descr->predecessor_chain = nullptr;
+ exec_descr->successor = nullptr;
+ exec_descr->next_instance = nullptr;
+
+ // Edit the taskdata for this specific instantiation. At present the
+ // task/taskdata structures cannot be used simultaneously by different
+ // threads. We could duplicate the structures to allow simultaneous issue,
+ // but that's not done yet. The exec_descr can already by thread-local,
+ // in principle, but for now it points to the taskgraph's single copy
+ // of each task/taskdata structure.
+ if (region->type == TASKGRAPH_REGION_NODE) {
+ kmp_task_t *task = exec_descr->region->task.node->task;
+ kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
+ taskdata->exec_descr = exec_descr;
+ }
+
+ for (kmp_taskgraph_exec_descr_t *pred = incoming_succs_to_fill; pred;
+ pred = pred->predecessor_chain) {
+ pred->successor = exec_descr;
+ }
+
+ *succs_to_fill_p = exec_descr;
+
+ return exec_descr;
+ }
+ case TASKGRAPH_REGION_SEQUENTIAL:
+ return __kmp_fill_sequential_descr(gtid, thread, taskgraph, region,
+ parent_taskdata, exec_descrs,
+ next_idx, succs_to_fill_p);
+ case TASKGRAPH_REGION_PARALLEL:
+ case TASKGRAPH_REGION_EXCLUSIVE:
+ return __kmp_fill_par_or_excl_descr(gtid, thread, taskgraph, region,
+ parent_taskdata, exec_descrs,
+ next_idx, succs_to_fill_p);
+ }
+ return nullptr;
+}
+
+#ifdef DEBUG_TASKGRAPH
+static void
+__kmp_debug_taskgraph_exec_descr(kmp_taskgraph_exec_descr_t *descrs,
+ kmp_size_t count) {
+ fprintf(stderr, "digraph ExecDescr {\n");
+ fprintf(stderr, " end [shape=diamond]\n");
+ for (kmp_size_t i = 0; i < count; i++) {
+ kmp_taskgraph_exec_descr_t *descr = &descrs[i];
+ fprintf(stderr, " \"%p\" [label=< <B>", descr->region);
+ switch (descr->region->type) {
+ case TASKGRAPH_REGION_PARALLEL:
+ fprintf(stderr, "par</B> %p<BR/>preds=%d", descr->region,
+ descr->npredecessors.load());
+ break;
+ case TASKGRAPH_REGION_EXCLUSIVE:
+ fprintf(stderr, "excl</B> %p<BR/>preds=%d", descr->region,
+ descr->npredecessors.load());
+ break;
+ case TASKGRAPH_REGION_NODE:
+ if (descr->region->task.node->u.resolved.count > 1) {
+ fprintf(stderr, "task</B> %p<BR/>preds=%d instances=%d",
+ descr->region->task.node,
+ descr->npredecessors.load(),
+ descr->region->task.node->u.resolved.count);
+ } else {
+ fprintf(stderr, "task</B> %p<BR/>preds=%d", descr->region->task.node,
+ descr->npredecessors.load());
+ }
+ break;
+ case TASKGRAPH_REGION_WAIT:
+ if (descr->region->task.node->u.resolved.count > 1) {
+ fprintf(stderr, "wait</B> %p<BR/>preds=%d instances=%d",
+ descr->region, descr->npredecessors.load(),
+ descr->region->task.node->u.resolved.count);
+ } else {
+ fprintf(stderr, "wait</B> %p<BR/>preds=%d", descr->region,
+ descr->npredecessors.load());
+ }
+ break;
+ default:
+ fprintf(stderr, "???</B>");
+ }
+ fprintf(stderr, " >, shape=box]\n");
+
+ if ((descr->region->type == TASKGRAPH_REGION_NODE ||
+ descr->region->type == TASKGRAPH_REGION_WAIT) &&
+ descr->region->task.node->u.resolved.count > 1) {
+ kmp_taskgraph_region_t *region = descr->region;
+ fprintf(stderr,
+ " \"%p\" -> \"%p\" [style=dotted, color=blue, constraint=false]\n",
+ region, region->task.next_instance);
+ }
+
+ if (descr->successor) {
+ fprintf(stderr, " \"%p\" -> \"%p\"\n", descr->region,
+ descr->successor->region);
+ if (descr->region->type == TASKGRAPH_REGION_PARALLEL ||
+ descr->region->type == TASKGRAPH_REGION_EXCLUSIVE) {
+ kmp_taskgraph_exec_descr_t *succ = descr->successor;
+ if (succ->sibling != succ) {
+ kmp_taskgraph_exec_descr_t *walk = succ;
+ fprintf(stderr, " subgraph { rank=same;\n");
+ do {
+ fprintf(stderr, " \"%p\" -> \"%p\" [color=red]\n", walk->region,
+ walk->sibling->region);
+ walk = walk->sibling;
+ } while (walk != succ);
+ fprintf(stderr, " }\n");
+ } else {
+ fprintf(stderr, "*** Expected parallel/exclusive to have >1 tasks\n");
+ }
+ }
+ } else {
+ fprintf(stderr, " \"%p\" -> end\n", descr->region);
+ }
+ }
+ fprintf(stderr, "}\n");
+}
+#endif
+
+static void
+__kmp_exec_descr_link_instances(kmp_taskgraph_exec_descr_t *descrs,
+ kmp_size_t count) {
+ for (kmp_size_t i = 0; i < count; i++) {
+ kmp_taskgraph_exec_descr_t *descr = &descrs[i];
+ if (descr->region->type == TASKGRAPH_REGION_NODE ||
+ descr->region->type == TASKGRAPH_REGION_WAIT)
+ descr->next_instance = descr->region->task.next_instance->exec_descr;
+ }
+}
+
+/// Reset, reparent and regroup the recorded task TASK and re-invoke it.
+
+static void
+__kmp_omp_tg_task(kmp_int32 gtid, kmp_task_t *task, kmp_taskgroup_t *taskgroup,
+ kmp_taskdata_t *parent_taskdata, bool serialize_immediate) {
+ kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
+ taskdata->td_parent = parent_taskdata;
+
+ taskdata->td_flags.complete = 0;
+ taskdata->td_flags.started = 0;
+ taskdata->td_flags.freed = 0;
+ taskdata->td_flags.executing = 0;
+ taskdata->td_flags.task_serial =
+ (parent_taskdata->td_flags.final ||
+ taskdata->td_flags.team_serial || taskdata->td_flags.tasking_ser);
+
+ KMP_ATOMIC_ST_RLX(&taskdata->td_untied_count, 0);
+ KMP_ATOMIC_ST_RLX(&taskdata->td_incomplete_child_tasks, 0);
+ // start at one because counts current task and children
+ KMP_ATOMIC_ST_RLX(&taskdata->td_allocated_child_tasks, 1);
+
+ taskdata->td_taskgroup = taskgroup;
+ KMP_ATOMIC_INC(&taskgroup->count);
+ KMP_ATOMIC_INC(&parent_taskdata->td_incomplete_child_tasks);
+ if (parent_taskdata->td_flags.tasktype == TASK_EXPLICIT)
+ KMP_ATOMIC_INC(&parent_taskdata->td_allocated_child_tasks);
+
+ __kmp_omp_task(gtid, task, false);
+}
+
+struct kmp_taskred_input;
+template <typename T>
+void *__kmp_task_reduction_init(int gtid, int num, T *data);
+
+static void
+__kmp_taskgraph_exec_descr_start(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_taskgraph_exec_descr_t *descr,
+ kmp_taskgroup_t *taskgroup) {
+ kmp_int32 npredecessors = KMP_ATOMIC_DEC(&descr->npredecessors) - 1;
+ if (npredecessors > 0)
+ return;
+
+ switch (descr->region->type) {
+ case TASKGRAPH_REGION_NODE:
+ case TASKGRAPH_REGION_WAIT: {
+ kmp_taskgraph_exec_descr_t *lowest_descr = nullptr, *iter = descr;
+ do {
+ if (!lowest_descr || lowest_descr > iter)
+ lowest_descr = iter;
+ iter = iter->next_instance;
+ } while (iter != descr);
+ kmp_int32 nblocks = KMP_ATOMIC_DEC(&lowest_descr->nblocks);
+ if (nblocks <= 0) {
+ if (descr->region->type == TASKGRAPH_REGION_NODE) {
+ kmp_task_t *task = descr->region->task.node->task;
+ kmp_taskdata_t *current_taskdata = thread->th.th_current_task;
+ __kmp_omp_tg_task(gtid, task, taskgroup, current_taskdata, false);
+ } else {
+ // There's no task for a 'taskwait', so start successors immediately.
+ kmp_taskgraph_exec_descr_t *walk = descr;
+ do {
+ if (walk->successor) {
+ __kmp_taskgraph_exec_descr_start(gtid, thread, walk->successor,
+ taskgroup);
+ }
+ walk = walk->next_instance;
+ } while (walk != descr);
+
+ }
+ }
+ break;
+ }
+ case TASKGRAPH_REGION_PARALLEL: {
+ if (descr->region->reduce_input) {
+ // If there are reductions associated with this parallel region, we
+ // start a new taskgroup here.
+ __kmpc_taskgroup(/*loc=*/nullptr, gtid);
+ // Update variable to the newly-created taskgroup.
+ taskgroup = thread->th.th_current_task->td_taskgroup;
+ __kmp_task_reduction_init(gtid,
+ descr->region->reduce_input->reduce_num_data,
+ (struct kmp_taskred_input *)
+ descr->region->reduce_input->reduce_data);
+ }
+ kmp_taskgraph_exec_descr_t *head = descr->successor;
+ kmp_taskgraph_exec_descr_t *item = head;
+ do {
+ __kmp_taskgraph_exec_descr_start(gtid, thread, item, taskgroup);
+ item = item->sibling;
+ } while (item != head);
+ if (descr->region->reduce_input)
+ __kmpc_end_taskgroup(/*loc=*/nullptr, gtid);
+ break;
+ }
+ case TASKGRAPH_REGION_EXCLUSIVE: {
+ kmp_taskgraph_exec_descr_t *head = descr->successor;
+ kmp_taskgraph_exec_descr_t *item = head;
+ do {
+ assert(item->region->type == TASKGRAPH_REGION_NODE);
+ kmp_task_t *task = item->region->task.node->task;
+ kmp_taskdata_t *current_taskdata = thread->th.th_current_task;
+ __kmp_omp_tg_task(gtid, task, taskgroup, current_taskdata, true);
+ item = item->sibling;
+ } while (item != head);
+ break;
+ }
+ default: ;
+ }
+}
+
+static bool
+__kmp_taskgraph_exec_descr_finish(kmp_int32 gtid, kmp_info_t *thread,
+ kmp_taskgraph_exec_descr_t *descr) {
+ switch (descr->region->type) {
+ case TASKGRAPH_REGION_NODE: {
+ kmp_task_t *task = descr->region->task.node->task;
+ kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
+ taskdata->td_flags.started = 0;
+ taskdata->td_flags.executing = 0;
+ taskdata->td_flags.complete = 0;
+ taskdata->td_flags.freed = 0;
+ bool any_successors = false;
+ kmp_taskgraph_exec_descr_t *walk = descr;
+ do {
+ if (walk->successor) {
+ any_successors = true;
+ __kmp_taskgraph_exec_descr_start(gtid, thread, walk->successor,
+ taskdata->td_taskgroup);
+ }
+ walk = walk->next_instance;
+ } while (walk != descr);
+ return any_successors;
+ }
+ default:
+ fprintf(stderr, "unexpected exec descr type for finish? (%p)\n", descr);
+ exit(1);
+ }
+
+ return false;
+}
+
+static kmp_size_t
+__kmp_exec_descr_count(kmp_taskgraph_region_t *region) {
+ kmp_size_t sum = 0;
+
+ switch (region->type) {
+ case TASKGRAPH_REGION_ENTRY:
+ case TASKGRAPH_REGION_EXIT:
+ return 0;
+ case TASKGRAPH_REGION_NODE:
+ case TASKGRAPH_REGION_WAIT:
+ return 1;
+ case TASKGRAPH_REGION_PARALLEL:
+ case TASKGRAPH_REGION_EXCLUSIVE:
+ sum++;
+ KMP_FALLTHROUGH();
+ case TASKGRAPH_REGION_SEQUENTIAL:
+ for (kmp_int32 i = 0; i < region->inner.num_children; i++)
+ sum += __kmp_exec_descr_count(region->inner.children[i]);
+ break;
+ default:
+ fprintf(stderr, "unexpected region type\n");
+ exit(1);
+ }
+ return sum;
+}
+
+
// Task Reduction implementation
//
// Note: initial implementation didn't take into account the possibility
@@ -2373,14 +2691,6 @@ the reduction either does not use omp_orig object, or the omp_orig is accessible
without help of the runtime library.
*/
void *__kmpc_task_reduction_init(int gtid, int num, void *data) {
-#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_info_t *tdg = __kmp_curr_tdg;
- if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
- tdg->rec_taskred_data = __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
- tdg->rec_num_taskred = num;
- KMP_MEMCPY(tdg->rec_taskred_data, data, sizeof(kmp_task_red_input_t) * num);
- }
-#endif
return __kmp_task_reduction_init(gtid, num, (kmp_task_red_input_t *)data);
}
@@ -2397,14 +2707,44 @@ Note: this entry supposes the optional compiler-generated initializer routine
has two parameters, pointer to object to be initialized and pointer to omp_orig
*/
void *__kmpc_taskred_init(int gtid, int num, void *data) {
-#if OMP_TASKGRAPH_EXPERIMENTAL
- kmp_tdg_info_t *tdg = __kmp_curr_tdg;
- if (tdg && __kmp_tdg_is_recording(tdg->tdg_status)) {
- tdg->rec_taskred_data = __kmp_allocate(sizeof(kmp_task_red_input_t) * num);
- tdg->rec_num_taskred = num;
- KMP_MEMCPY(tdg->rec_taskred_data, data, sizeof(kmp_task_red_input_t) * num);
+ return __kmp_task_reduction_init(gtid, num, (kmp_taskred_input_t *)data);
+}
+
+static kmp_taskgraph_record_t *__kmp_taskgraph_or_parent_recording(
+ kmp_taskgroup_t *taskgroup) {
+ kmp_taskgraph_record_t *rec = nullptr;
+
+ for (; taskgroup; taskgroup = taskgroup->parent) {
+ rec = KMP_ATOMIC_LD_ACQ(&taskgroup->taskgraph.recording);
+ if (rec)
+ return rec;
+ }
+
+ return nullptr;
+}
+
+void *__kmpc_taskgraph_taskred_init(kmp_int32 gtid, kmp_int32 num, void *data) {
+ kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_taskgroup_t *taskgroup = thread->th.th_current_task->td_taskgroup;
+ kmp_taskgraph_record_t *rec = __kmp_taskgraph_or_parent_recording(taskgroup);
+
+ if (rec) {
+ kmp_taskgraph_status_t status = KMP_ATOMIC_LD_ACQ(&rec->status);
+ if (status == KMP_TDG_RECORDING) {
+ kmp_taskgraph_reduce_input_data_t *input_data =
+ (kmp_taskgraph_reduce_input_data_t *)
+ __kmp_fast_allocate(thread,
+ sizeof(kmp_taskgraph_reduce_input_data_t));
+ // The compiler might build the reduction input data on the stack, so
+ // we must make a copy.
+ input_data->reduce_data = __kmp_fast_allocate(thread, sizeof(kmp_taskred_input_t) * num);
+ KMP_MEMCPY(input_data->reduce_data, data, sizeof(kmp_taskred_input_t) * num);;
+ input_data->reduce_num_data = num;
+ taskgroup->taskgraph.reduce_input = input_data;
+ } else if (status == KMP_TDG_READY)
+ assert(false &&
+ "unexpected __kmpc_taskgraph_taskred_init with ready taskgraph");
}
-#endif
return __kmp_task_reduction_init(gtid, num, (kmp_taskred_input_t *)data);
}
@@ -2444,24 +2784,14 @@ void *__kmpc_task_reduction_get_th_data(int gtid, void *tskgrp, void *data) {
return data; // nothing to do
kmp_taskgroup_t *tg = (kmp_taskgroup_t *)tskgrp;
- if (tg == NULL)
+ if (tg == NULL || thread->th.th_current_task->owning_taskgraph) {
tg = thread->th.th_current_task->td_taskgroup;
+ }
KMP_ASSERT(tg != NULL);
kmp_taskred_data_t *arr;
kmp_int32 num;
kmp_int32 tid = thread->th.th_info.ds.ds_tid;
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if ((thread->th.th_current_task->is_taskgraph) &&
- (!__kmp_tdg_is_recording(__kmp_curr_tdg->tdg_status))) {
- tg = thread->th.th_current_task->td_taskgroup;
- KMP_ASSERT(tg != NULL);
- KMP_ASSERT(tg->reduce_data != NULL);
- arr = (kmp_taskred_data_t *)(tg->reduce_data);
- num = tg->reduce_num_data;
- }
-#endif
-
KMP_ASSERT(data != NULL);
while (tg != NULL) {
arr = (kmp_taskred_data_t *)(tg->reduce_data);
@@ -2666,6 +2996,8 @@ void __kmpc_taskgroup(ident_t *loc, int gtid) {
tg_new->reduce_data = NULL;
tg_new->reduce_num_data = 0;
tg_new->gomp_data = NULL;
+ tg_new->taskgraph.recording = nullptr;
+ tg_new->taskgraph.reduce_input = nullptr;
taskdata->td_taskgroup = tg_new;
#if OMPT_SUPPORT && OMPT_OPTIONAL
@@ -2685,6 +3017,7 @@ void __kmpc_taskgroup(ident_t *loc, int gtid) {
#endif
}
+#undef __kmpc_end_taskgroup
// __kmpc_end_taskgroup: Wait until all tasks generated by the current task
// and its descendants are complete
void __kmpc_end_taskgroup(ident_t *loc, int gtid) {
@@ -2821,6 +3154,11 @@ void __kmpc_end_taskgroup(ident_t *loc, int gtid) {
__kmp_task_reduction_fini(thread, taskgroup);
}
}
+
+ // This should have been moved to a task node within the group, else it will
+ // leak here.
+ assert(!taskgroup->taskgraph.reduce_input);
+
// Restore parent taskgroup for the current task
taskdata->td_taskgroup = taskgroup->parent;
__kmp_thread_free(thread, taskgroup);
@@ -2837,6 +3175,38 @@ void __kmpc_end_taskgroup(ident_t *loc, int gtid) {
#endif
}
+void
+__kmp_replay_taskgraph(kmp_int32 gtid, kmp_taskdata_t *current_taskdata,
+ kmp_taskgraph_record_t *taskgraph, kmp_uint32 graph_id,
+ kmp_taskgroup_t *taskgroup) {
+ kmp_info_t *thread = __kmp_threads[gtid];
+
+ kmp_taskgraph_exec_descr_t *exec_descrs = taskgraph->exec_descrs;
+
+ if (!exec_descrs) {
+ kmp_int32 exec_descr_count = __kmp_exec_descr_count(taskgraph->root);
+ exec_descrs =
+ (kmp_taskgraph_exec_descr_t*)__kmp_thread_malloc(thread,
+ exec_descr_count * sizeof(kmp_taskgraph_exec_descr_t));
+ taskgraph->exec_descrs = exec_descrs;
+ taskgraph->exec_descr_size = exec_descr_count;
+ }
+
+ kmp_taskgraph_exec_descr_t *succs_to_fill = nullptr;
+ kmp_size_t next_idx = 0;
+ kmp_taskgraph_exec_descr_t *head =
+ __kmp_fill_exec_descr(gtid, thread, taskgraph, taskgraph->root,
+ current_taskdata, exec_descrs, next_idx,
+ &succs_to_fill);
+ assert(next_idx == taskgraph->exec_descr_size);
+
+ __kmp_exec_descr_link_instances(exec_descrs, taskgraph->exec_descr_size);
+#ifdef DEBUG_TASKGRAPH
+ __kmp_debug_taskgraph_exec_descr(exec_descrs, taskgraph->exec_descr_size);
+#endif
+ __kmp_taskgraph_exec_descr_start(gtid, thread, head, taskgroup);
+}
+
static kmp_task_t *__kmp_get_priority_task(kmp_int32 gtid,
kmp_task_team_t *task_team,
kmp_int32 is_constrained) {
@@ -4228,9 +4598,6 @@ static void __kmp_first_top_half_finish_proxy(kmp_taskdata_t *taskdata) {
KMP_DEBUG_ASSERT(taskdata->td_flags.freed == 0);
taskdata->td_flags.complete = 1; // mark the task as completed
-#if OMP_TASKGRAPH_EXPERIMENTAL
- taskdata->td_flags.onced = 1;
-#endif
if (taskdata->td_taskgroup)
KMP_ATOMIC_DEC(&taskdata->td_taskgroup->count);
@@ -4435,7 +4802,7 @@ void __kmp_fulfill_event(kmp_event_t *event) {
kmp_task_t *__kmp_task_dup_alloc(kmp_info_t *thread, kmp_task_t *task_src
#if OMP_TASKGRAPH_EXPERIMENTAL
,
- int taskloop_recur
+ bool taskgraph
#endif
) {
kmp_task_t *task;
@@ -4465,12 +4832,6 @@ kmp_task_t *__kmp_task_dup_alloc(kmp_info_t *thread, kmp_task_t *task_src
task = KMP_TASKDATA_TO_TASK(taskdata);
// Initialize new task (only specific fields not affected by memcpy)
-#if OMP_TASKGRAPH_EXPERIMENTAL
- if (taskdata->is_taskgraph && !taskloop_recur &&
- __kmp_tdg_is_recording(taskdata_src->tdg->tdg_status))
- taskdata->td_tdg_task_id =
- KMP_ATOMIC_INC(&taskdata_src->tdg->tdg_task_id_next);
-#endif
taskdata->td_task_id = KMP_GEN_TASK_ID();
if (task->shareds != NULL) { // need setup shareds pointer
shareds_offset = (char *)task_src->shareds - (char *)taskdata_src;
@@ -4489,7 +4850,8 @@ kmp_task_t *__kmp_task_dup_alloc(kmp_info_t *thread, kmp_task_t *task_src
// Only need to keep track of child task counts if team parallel and tasking
// not serialized
- if (!(taskdata->td_flags.team_serial || taskdata->td_flags.tasking_ser)) {
+ if (!(taskdata->td_flags.team_serial || taskdata->td_flags.tasking_ser ||
+ taskgraph)) {
KMP_ATOMIC_INC(&parent_task->td_incomplete_child_tasks);
if (parent_task->td_taskgroup)
KMP_ATOMIC_INC(&parent_task->td_taskgroup->count);
@@ -4622,6 +4984,45 @@ class kmp_taskloop_bounds_t {
}
};
+kmp_taskgraph_node_t* __kmp_taskgraph_node_alloc(kmp_taskgraph_record_t *rec,
+ kmp_task_t *task,
+ kmp_size_t *index_p = nullptr) {
+ kmp_int32 gtid = rec->gtid;
+ kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_taskgraph_node_t *new_task = nullptr;
+
+ __kmp_acquire_lock(&rec->map_lock, gtid);
+
+ if (!rec->record_map) {
+ rec->nodes_allocated = 4;
+ rec->record_map = (kmp_taskgraph_node_t *)__kmp_thread_malloc(thread,
+ rec->nodes_allocated * sizeof(kmp_taskgraph_node_t));
+ }
+
+ if (rec->num_tasks >= rec->nodes_allocated) {
+ rec->record_map =
+ (kmp_taskgraph_node_t *)__kmp_thread_realloc(thread, rec->record_map,
+ 2 * rec->nodes_allocated * sizeof(kmp_taskgraph_node_t));
+ rec->nodes_allocated *= 2;
+ }
+
+ new_task = &rec->record_map[rec->num_tasks];
+ if (index_p)
+ *index_p = rec->num_tasks;
+ ++rec->num_tasks;
+
+ __kmp_release_lock(&rec->map_lock, gtid);
+
+ new_task->task = task;
+ new_task->taskloop_task = false;
+ new_task->reduce_input = nullptr;
+ new_task->u.unresolved.ndeps = 0;
+ new_task->u.unresolved.dep_list = nullptr;
+ new_task->u.unresolved.cfg_successor = -1;
+
+ return new_task;
+}
+
// __kmp_taskloop_linear: Start tasks of the taskloop linearly
//
// loc Source location information
@@ -4638,15 +5039,18 @@ class kmp_taskloop_bounds_t {
// tc Iterations count
// task_dup Tasks duplication routine
// codeptr_ra Return address for OMPT events
-void __kmp_taskloop_linear(ident_t *loc, int gtid, kmp_task_t *task,
- kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st,
- kmp_uint64 ub_glob, kmp_uint64 num_tasks,
- kmp_uint64 grainsize, kmp_uint64 extras,
- kmp_int64 last_chunk, kmp_uint64 tc,
+static void __kmp_taskloop_linear(ident_t *loc, int gtid, kmp_task_t *task,
+ kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st,
+ kmp_int32 nogroup, kmp_uint64 ub_glob,
+ kmp_uint64 num_tasks, kmp_uint64 grainsize,
+ kmp_uint64 extras, kmp_int64 last_chunk,
+ kmp_uint64 tc,
#if OMPT_SUPPORT
- void *codeptr_ra,
+ void *codeptr_ra,
#endif
- void *task_dup) {
+ void *task_dup,
+ kmp_taskgraph_record_t *taskgraph_rec =
+ nullptr) {
KMP_COUNT_BLOCK(OMP_TASKLOOP);
KMP_TIME_PARTITIONED_BLOCK(OMP_taskloop_scheduling);
p_task_dup_t ptask_dup = (p_task_dup_t)task_dup;
@@ -4659,6 +5063,7 @@ void __kmp_taskloop_linear(ident_t *loc, int gtid, kmp_task_t *task,
kmp_taskdata_t *current_task = thread->th.th_current_task;
kmp_task_t *next_task;
kmp_int32 lastpriv = 0;
+ kmp_int32 taskloop_prev_idx = -1, taskloop_first_idx = -1;
KMP_DEBUG_ASSERT(tc == num_tasks * grainsize +
(last_chunk < 0 ? last_chunk : extras));
@@ -4700,7 +5105,7 @@ void __kmp_taskloop_linear(ident_t *loc, int gtid, kmp_task_t *task,
}
#if OMP_TASKGRAPH_EXPERIMENTAL
- next_task = __kmp_task_dup_alloc(thread, task, /* taskloop_recur */ 0);
+ next_task = __kmp_task_dup_alloc(thread, task, /*taskgraph=*/taskgraph_rec);
#else
next_task = __kmp_task_dup_alloc(thread, task); // allocate new task
#endif
@@ -4725,24 +5130,70 @@ void __kmp_taskloop_linear(ident_t *loc, int gtid, kmp_task_t *task,
gtid, i, next_task, lower, upper, st,
next_task_bounds.get_lower_offset(),
next_task_bounds.get_upper_offset()));
+ if (taskgraph_rec) {
+ kmp_size_t rec_index = -1;
+ // Record the task in the taskgraph.
+ kmp_taskgraph_node_t *node =
+ __kmp_taskgraph_node_alloc(taskgraph_rec, next_task, &rec_index);
+ kmp_taskgroup_t *taskgroup = current_task->td_taskgroup;
+ if (taskgroup->taskgraph.reduce_input) {
+ node->reduce_input = taskgroup->taskgraph.reduce_input;
+ taskgroup->taskgraph.reduce_input = nullptr;
+ }
+ node->taskloop_task = true;
+ next_taskdata->owning_taskgraph = taskgraph_rec;
+ // FIXME: These dependency fields might be back-filled by the as-yet
+ // unimplemented task_iteration subsidiary directive. We'll need a way
+ // to locate the correct task given the value of the iteration variable,
+ // or similar.
+ node->u.unresolved.ndeps = 0;
+ node->u.unresolved.dep_list = nullptr;
+ if (nogroup)
+ taskgraph_rec->record_map[rec_index].u.unresolved.cfg_successor = -1;
+ else if (taskloop_prev_idx != -1)
+ taskgraph_rec->record_map[taskloop_prev_idx].u.unresolved.cfg_successor =
+ rec_index;
+ if (taskloop_first_idx == -1)
+ taskloop_first_idx = rec_index;
+ taskloop_prev_idx = rec_index;
+ } else {
#if OMPT_SUPPORT
- __kmp_omp_taskloop_task(NULL, gtid, next_task,
- codeptr_ra); // schedule new task
+ __kmp_omp_taskloop_task(NULL, gtid, next_task,
+ codeptr_ra); // schedule new task
#if OMPT_OPTIONAL
- if (ompt_enabled.ompt_callback_dispatch) {
- OMPT_GET_DISPATCH_CHUNK(next_taskdata->ompt_task_info.dispatch_chunk,
- lower, upper, st);
- }
+ if (ompt_enabled.ompt_callback_dispatch) {
+ OMPT_GET_DISPATCH_CHUNK(next_taskdata->ompt_task_info.dispatch_chunk,
+ lower, upper, st);
+ }
#endif // OMPT_OPTIONAL
#else
- __kmp_omp_task(gtid, next_task, true); // schedule new task
+ __kmp_omp_task(gtid, next_task, true); // schedule new task
#endif
+ }
lower = upper + st; // adjust lower bound for the next iteration
}
- // free the pattern task and exit
- __kmp_task_start(gtid, task, current_task); // make internal bookkeeping
- // do not execute the pattern task, just do internal bookkeeping
- __kmp_task_finish<false>(gtid, task, current_task);
+ if (taskgraph_rec) {
+ if (taskloop_prev_idx != -1 && !nogroup) {
+ // Create a node to act as an "end group" marker.
+ kmp_size_t endgroup_idx = -1;
+ kmp_taskgraph_node_t *endgrpnode =
+ __kmp_taskgraph_node_alloc(taskgraph_rec, nullptr, &endgroup_idx);
+ endgrpnode->taskloop_task = true;
+ // Point all the cfg_successor indices to this node now.
+ for (kmp_int32 looptask = taskloop_first_idx; looptask != -1;) {
+ kmp_int32 next_task =
+ taskgraph_rec->record_map[looptask].u.unresolved.cfg_successor;
+ taskgraph_rec->record_map[looptask].u.unresolved.cfg_successor =
+ endgroup_idx;
+ looptask = next_task;
+ }
+ }
+ } else {
+ // free the pattern task and exit
+ __kmp_task_start(gtid, task, current_task); // make internal bookkeeping
+ // do not execute the pattern task, just do internal bookkeeping
+ __kmp_task_finish<false>(gtid, task, current_task);
+ }
}
// Structure to keep taskloop parameters for auxiliary task
@@ -4812,8 +5263,8 @@ int __kmp_taskloop_task(int gtid, void *ptask) {
#endif
task_dup);
else
- __kmp_taskloop_linear(NULL, gtid, task, lb, ub, st, ub_glob, num_tasks,
- grainsize, extras, last_chunk, tc,
+ __kmp_taskloop_linear(NULL, gtid, task, lb, ub, st, /*nogroup=*/true,
+ ub_glob, num_tasks, grainsize, extras, last_chunk, tc,
#if OMPT_SUPPORT
codeptr_ra,
#endif
@@ -4902,8 +5353,7 @@ void __kmp_taskloop_recur(ident_t *loc, int gtid, kmp_task_t *task,
// create pattern task for 2nd half of the loop
#if OMP_TASKGRAPH_EXPERIMENTAL
- next_task = __kmp_task_dup_alloc(thread, task,
- /* taskloop_recur */ 1);
+ next_task = __kmp_task_dup_alloc(thread, task, /*taskgraph=*/false);
#else
next_task = __kmp_task_dup_alloc(thread, task); // duplicate the task
#endif
@@ -4941,8 +5391,8 @@ void __kmp_taskloop_recur(ident_t *loc, int gtid, kmp_task_t *task,
#if OMP_TASKGRAPH_EXPERIMENTAL
kmp_taskdata_t *new_task_data = KMP_TASK_TO_TASKDATA(new_task);
- new_task_data->tdg = taskdata->tdg;
- new_task_data->is_taskgraph = 0;
+ //new_task_data->tdg = taskdata->tdg;
+ new_task_data->owning_taskgraph = nullptr;
#endif
#if OMPT_SUPPORT
@@ -4961,8 +5411,8 @@ void __kmp_taskloop_recur(ident_t *loc, int gtid, kmp_task_t *task,
#endif
task_dup);
else
- __kmp_taskloop_linear(loc, gtid, task, lb, ub, st, ub_glob, n_tsk0,
- gr_size0, ext0, last_chunk0, tc0,
+ __kmp_taskloop_linear(loc, gtid, task, lb, ub, st, /*nogroup=*/true,
+ ub_glob, n_tsk0, gr_size0, ext0, last_chunk0, tc0,
#if OMPT_SUPPORT
codeptr_ra,
#endif
@@ -4974,14 +5424,17 @@ void __kmp_taskloop_recur(ident_t *loc, int gtid, kmp_task_t *task,
static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st,
int nogroup, int sched, kmp_uint64 grainsize,
- int modifier, void *task_dup) {
+ int modifier, void *task_dup,
+ kmp_taskgraph_record_t *taskgraph_rec = nullptr) {
kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
KMP_DEBUG_ASSERT(task != NULL);
if (nogroup == 0) {
#if OMPT_SUPPORT && OMPT_OPTIONAL
OMPT_STORE_RETURN_ADDRESS(gtid);
#endif
- __kmpc_taskgroup(loc, gtid);
+ // This is unreachable, I think.
+ if (!taskgraph_rec)
+ __kmpc_taskgroup(loc, gtid);
}
// =========================================================================
// calculate loop parameters
@@ -5013,17 +5466,19 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
}
if (tc == 0) {
KA_TRACE(20, ("__kmp_taskloop(exit): T#%d zero-trip loop\n", gtid));
- // free the pattern task and exit
- __kmp_task_start(gtid, task, current_task);
- // do not execute anything for zero-trip loop
- __kmp_task_finish<false>(gtid, task, current_task);
+ if (!taskgraph_rec) {
+ // free the pattern task and exit
+ __kmp_task_start(gtid, task, current_task);
+ // do not execute anything for zero-trip loop
+ __kmp_task_finish<false>(gtid, task, current_task);
+ }
return;
}
#if OMPT_SUPPORT && OMPT_OPTIONAL
ompt_team_info_t *team_info = __ompt_get_teaminfo(0, NULL);
ompt_task_info_t *task_info = __ompt_get_task_info_object(0);
- if (ompt_enabled.ompt_callback_work) {
+ if (ompt_enabled.ompt_callback_work && !taskgraph_rec) {
ompt_callbacks.ompt_callback(ompt_callback_work)(
ompt_work_taskloop, ompt_scope_begin, &(team_info->parallel_data),
&(task_info->task_data), tc, OMPT_GET_RETURN_ADDRESS(0));
@@ -5080,14 +5535,27 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
KMP_DEBUG_ASSERT(num_tasks > 0);
// =========================================================================
- // check if clause value first
+ // Handle taskgraph case first. We just generate tasks and record them in
+ // the graph, but we do not execute them here.
+ if (taskgraph_rec) {
+ if (if_val == 0) {
+ taskdata->td_flags.task_serial = 1;
+ taskdata->td_flags.tiedness = TASK_TIED;
+ }
+ __kmp_taskloop_linear(loc, gtid, task, lb, ub, st, nogroup, ub_glob,
+ num_tasks, grainsize, extras, last_chunk, tc,
+#if OMPT_SUPPORT
+ OMPT_GET_RETURN_ADDRESS(0),
+#endif
+ task_dup, taskgraph_rec);
+ // check if clause value next
// Also require GOMP_taskloop to reduce to linear (taskdata->td_flags.native)
- if (if_val == 0) { // if(0) specified, mark task as serial
+ } else if (if_val == 0) { // if(0) specified, mark task as serial
taskdata->td_flags.task_serial = 1;
taskdata->td_flags.tiedness = TASK_TIED; // AC: serial task cannot be untied
// always start serial tasks linearly
- __kmp_taskloop_linear(loc, gtid, task, lb, ub, st, ub_glob, num_tasks,
- grainsize, extras, last_chunk, tc,
+ __kmp_taskloop_linear(loc, gtid, task, lb, ub, st, nogroup, ub_glob,
+ num_tasks, grainsize, extras, last_chunk, tc,
#if OMPT_SUPPORT
OMPT_GET_RETURN_ADDRESS(0),
#endif
@@ -5110,8 +5578,8 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
"(%lld), grain %llu, extras %llu, last_chunk %lld\n",
gtid, tc, num_tasks, num_tasks_min, grainsize, extras,
last_chunk));
- __kmp_taskloop_linear(loc, gtid, task, lb, ub, st, ub_glob, num_tasks,
- grainsize, extras, last_chunk, tc,
+ __kmp_taskloop_linear(loc, gtid, task, lb, ub, st, nogroup, ub_glob,
+ num_tasks, grainsize, extras, last_chunk, tc,
#if OMPT_SUPPORT
OMPT_GET_RETURN_ADDRESS(0),
#endif
@@ -5130,7 +5598,8 @@ static void __kmp_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int if_val,
#if OMPT_SUPPORT && OMPT_OPTIONAL
OMPT_STORE_RETURN_ADDRESS(gtid);
#endif
- __kmpc_end_taskgroup(loc, gtid);
+ if (!taskgraph_rec)
+ __kmpc_end_taskgroup(loc, gtid);
}
KA_TRACE(20, ("__kmp_taskloop(exit): T#%d\n", gtid));
}
@@ -5232,347 +5701,259 @@ bool __kmpc_omp_has_task_team(kmp_int32 gtid) {
}
#if OMP_TASKGRAPH_EXPERIMENTAL
+
+static kmp_taskgraph_record_t*
+__kmp_taskgraph_alloc(kmp_int32 gtid, kmp_int32 graph_id) {
+ kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_taskgraph_record_t *new_rec =
+ (kmp_taskgraph_record_t *)__kmp_fast_allocate(thread, sizeof(kmp_taskgraph_record_t));
+ new_rec->status = KMP_TDG_RECORDING;
+ new_rec->gtid = gtid;
+ new_rec->graph_id = graph_id;
+ __kmp_init_lock(&new_rec->map_lock);
+ new_rec->record_map = nullptr;
+ new_rec->alloc_root = nullptr;
+ new_rec->recycled_deps = nullptr;
+ new_rec->num_tasks = 0;
+ new_rec->nodes_allocated = 0;
+ new_rec->num_mutexes = 0;
+ new_rec->exec_descrs = nullptr;
+ new_rec->exec_descr_size = 0;
+ new_rec->next = nullptr;
+ return new_rec;
+}
+
+// Clone a (new) task that has had its private variables and shared variables
+// initialised already.
+static kmp_task_t *__kmp_taskgraph_clone_task(kmp_info_t *thread,
+ kmp_taskgraph_record_t *taskgraph, kmp_task_t *orig,
+ size_t sizeof_kmp_task_t, size_t sizeof_shareds) {
+ // FIXME: This should use a "taskdup" function like taskloops in cases where
+ // private variables are not trivially copyable. For now, do it by plain
+ // bitwise copy.
+ // FIXME 2: It's intended that this copy be persistent, and can be
+ // re-executed on taskgraph replay. Make sure that works (for shared
+ // variables) if stack addresses change (i.e. a task-generating function is
+ // called from different call stack depths).
+ kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(orig);
+ size_t shareds_offset = sizeof(kmp_taskdata_t) + sizeof_kmp_task_t;
+ shareds_offset = __kmp_round_up_to_val(shareds_offset, sizeof(kmp_uint64));
+ kmp_taskdata_t *copy_td = (kmp_taskdata_t *)__kmp_fast_allocate(thread, shareds_offset + sizeof_shareds);
+ KMP_MEMCPY(copy_td, taskdata, shareds_offset + sizeof_shareds);
+ // Tasks cloned for a taskgraph always have this field set.
+ copy_td->owning_taskgraph = taskgraph;
+ return KMP_TASKDATA_TO_TASK(copy_td);
+}
+
// __kmpc_taskgraph: record or replay taskgraph
// loc_ref: Location of TDG, not used yet
// gtid: Global Thread ID of the encountering thread
-// input_flags: Flags associated with the TDG
-// tdg_id: ID of the TDG to record, for now, incremental integer
+// tdg_handle: Handle of taskgraph -- the address of a slot in the host
+// program that we write the taskgraph (list) pointer back to.
+// graph_id: Graph ID for the taskgraph.
+// graph_reset: 1 to reset taskgraph for this taskgraph/graph_id, 0 to replay
+// (or record, initially).
+// nogroup: 1 to omit implicit taskgroup, 0 to include it.
// entry: Pointer to the entry function
// args: Pointer to the function arguments
-void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid, kmp_int32 input_flags,
- kmp_uint32 tdg_id, kmp_uint32 graph_id,
+void __kmpc_taskgraph(ident_t *loc_ref, kmp_int32 gtid,
+ std::atomic<void*> *tdg_handle, kmp_uint32 graph_id,
+ kmp_int32 graph_reset, kmp_int32 nogroup,
void (*entry)(void *), void *args) {
- kmp_int32 res = __kmpc_start_record_task(loc_ref, gtid, input_flags, tdg_id);
- // When res = 1, we either start recording or only execute tasks
- // without recording. Need to execute entry function in both cases.
- if (res)
- entry(args);
+ kmp_taskgraph_record_t *record = (kmp_taskgraph_record_t*)KMP_ATOMIC_LD_ACQ(tdg_handle);
+ kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_taskgroup_t *taskgroup;
- __kmpc_end_record_task(loc_ref, gtid, input_flags, tdg_id);
-}
+ __kmpc_taskgroup(loc_ref, gtid);
-// __kmp_find_tdg: identify a TDG through its ID
-// tdg_id: ID of the TDG
-// returns: If a TDG corresponding to this ID is found and not
-// its initial state, return the pointer to it, otherwise nullptr
-static kmp_tdg_info_t *__kmp_find_tdg(kmp_int32 tdg_id) {
- kmp_tdg_info_t *res = nullptr;
- if (__kmp_max_tdgs == 0)
- return res;
-
- if (__kmp_global_tdgs == NULL)
- __kmp_global_tdgs = (kmp_tdg_info_t **)__kmp_allocate(
- sizeof(kmp_tdg_info_t *) * __kmp_max_tdgs);
-
- for (kmp_int32 tdg_idx = 0; tdg_idx < __kmp_max_tdgs; tdg_idx++) {
- if (__kmp_global_tdgs[tdg_idx] &&
- __kmp_global_tdgs[tdg_idx]->tdg_id == tdg_id) {
- if (__kmp_global_tdgs[tdg_idx]->tdg_status != KMP_TDG_NONE)
- res = __kmp_global_tdgs[tdg_idx];
- break;
+ taskgroup = thread->th.th_current_task->td_taskgroup;
+
+ // FIXME: Implement graph_id and graph_reset functionality. For graph_id, we
+ // will form a singly-linked list of task records chained through their
+ // "next" pointers (per taskgraph construct handle). Thread safety and
+ // locking need careful consideration. We could use a "list header" node
+ // consisting of a lock and a pointer to
+ // the list proper, perhaps. Ideally we'd want to avoid locking/unlocking in
+ // the common case (replay).
+
+ if (!record) {
+ record = __kmp_taskgraph_alloc(gtid, graph_id);
+ // Another thread may have allocated the taskgraph already. Check that here.
+ kmp_taskgraph_record_t *other =
+ (kmp_taskgraph_record_t *)KMP_COMPARE_AND_STORE_RET64(tdg_handle,
+ nullptr,
+ record);
+ if (other != nullptr) {
+ __kmp_fast_free(thread, record);
+ record = other;
+ // Should we stall here until the other thread has finished recording the
+ // taskgraph? That might be safer. Otherwise multiple threads will add
+ // tasks to the taskgraph simultaneously, which is unlikely to be what
+ // the user wants. Unclear what to do here. FIXME.
+ } else {
+ // We record 'nogroup' here. We always create a group for recording the
+ // taskgraph, but we could avoid doing so for replay. That's not done
+ // yet though.
+ record->nogroup_taskgroup = nogroup;
+ // Store our taskgraph record into the taskgraph directive's implicit
+ // taskgroup.
+ KMP_ATOMIC_ST_REL(&taskgroup->taskgraph.recording, record);
}
}
- return res;
-}
-// __kmp_alloc_tdg: Allocates a TDG if it doesn't already exist.
-// tdg_id: ID of the TDG.
-// returns: A pointer to the TDG if it already exists. Otherwise,
-// allocates a new TDG if the maximum limit has not been reached.
-// Returns nullptr if no TDG can be allocated.
-static kmp_tdg_info_t *__kmp_alloc_tdg(kmp_int32 tdg_id) {
- kmp_tdg_info_t *res = nullptr;
- if ((res = __kmp_find_tdg(tdg_id)))
- return res;
-
- if (__kmp_num_tdg > __kmp_max_tdgs)
- return res;
-
- for (kmp_int32 tdg_idx = 0; tdg_idx < __kmp_max_tdgs; tdg_idx++) {
- if (!__kmp_global_tdgs[tdg_idx]) {
- kmp_tdg_info_t *tdg =
- (kmp_tdg_info_t *)__kmp_allocate(sizeof(kmp_tdg_info_t));
- __kmp_global_tdgs[tdg_idx] = tdg;
- __kmp_curr_tdg = tdg;
- res = __kmp_global_tdgs[tdg_idx];
- break;
- }
+ kmp_taskgraph_status_t status = KMP_ATOMIC_LD_ACQ(&record->status);
+ if (status == KMP_TDG_RECORDING)
+ entry(args);
+ else if (status == KMP_TDG_READY) {
+ kmp_taskdata *current_taskdata = thread->th.th_current_task;
+ KG_TRACE(10, ("Replay taskgraph %p from task %p\n", record,
+ KMP_TASKDATA_TO_TASK(current_taskdata)));
+ __kmp_acquire_lock(&record->map_lock, gtid);
+ __kmp_replay_taskgraph(gtid, current_taskdata, record, graph_id, taskgroup);
+ __kmpc_end_taskgroup(loc_ref, gtid);
+ __kmp_release_lock(&record->map_lock, gtid);
+ return;
}
- return res;
-}
-
-// __kmp_free_tdg: Frees a TDG if it exists.
-// tdg_id: ID of the TDG to be freed.
-// returns: true if a TDG with the given ID was found and successfully freed,
-// false if no such TDG exists.
-static bool __kmp_free_tdg(kmp_int32 tdg_id) {
- kmp_tdg_info_t *tdg = nullptr;
- if (__kmp_global_tdgs == NULL)
- return false;
-
- for (kmp_int32 tdg_idx = 0; tdg_idx < __kmp_max_tdgs; tdg_idx++) {
- if (__kmp_global_tdgs[tdg_idx] &&
- __kmp_global_tdgs[tdg_idx]->tdg_id == tdg_id) {
- tdg = __kmp_global_tdgs[tdg_idx];
- for (kmp_int map_idx = 0; map_idx < tdg->map_size; map_idx++) {
- __kmp_free(tdg->record_map[map_idx].successors);
- }
- __kmp_free(tdg->record_map);
- if (tdg->root_tasks)
- __kmp_free(tdg->root_tasks);
- __kmp_free(tdg);
- __kmp_global_tdgs[tdg_idx] = NULL;
- return true;
- }
- }
- return false;
-}
+ __kmpc_end_taskgroup(loc_ref, gtid);
-// __kmp_print_tdg_dot: prints the TDG to a dot file
-// tdg: ID of the TDG
-// gtid: Global Thread ID
-void __kmp_print_tdg_dot(kmp_tdg_info_t *tdg, kmp_int32 gtid) {
- kmp_int32 tdg_id = tdg->tdg_id;
- KA_TRACE(10, ("__kmp_print_tdg_dot(enter): T#%d tdg_id=%d \n",
- __kmp_get_gtid(), tdg_id));
-
- char file_name[20];
- sprintf(file_name, "tdg_%d.dot", tdg_id);
- kmp_safe_raii_file_t tdg_file(file_name, "w");
-
- kmp_int32 num_tasks = KMP_ATOMIC_LD_RLX(&tdg->num_tasks);
- kmp_int32 map_size = tdg->map_size;
- fprintf(tdg_file,
- "digraph TDG {\n"
- " compound=true\n"
- " subgraph cluster {\n"
- " label=TDG_%d\n",
- tdg_id);
- for (kmp_int32 i = 0; i < num_tasks; i++) {
- fprintf(tdg_file, " %d[style=bold]\n", i);
- }
- fprintf(tdg_file, " }\n");
- kmp_int32 tasks = 0;
- for (kmp_int32 i = 0; tasks < num_tasks && i < map_size; i++) {
- if (tdg->record_map[i].task == nullptr)
- continue;
- tasks++;
- kmp_int32 nsuccessors = tdg->record_map[i].nsuccessors;
- kmp_int32 *successors = tdg->record_map[i].successors;
- if (nsuccessors > 0) {
- for (kmp_int32 j = 0; j < nsuccessors; j++)
- fprintf(tdg_file, " %d -> %d \n", i, successors[j]);
- }
+ // This could perhaps be spawned as a separate task in order to avoid
+ // blocking this thread.
+ if (record->gtid == gtid) {
+ kmp_taskdata *current_taskdata = thread->th.th_current_task;
+ __kmp_build_taskgraph(gtid, current_taskdata, record);
}
- fprintf(tdg_file, "}");
- KA_TRACE(10, ("__kmp_print_tdg_dot(exit): T#%d tdg_id=%d \n",
- __kmp_get_gtid(), tdg_id));
}
-// __kmp_exec_tdg: launch the execution of a previous
-// recorded TDG
-// gtid: Global Thread ID
-// tdg: ID of the TDG
-void __kmp_exec_tdg(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
- KMP_DEBUG_ASSERT(tdg->tdg_status == KMP_TDG_READY);
- KA_TRACE(10, ("__kmp_exec_tdg(enter): T#%d tdg_id=%d num_roots=%d\n", gtid,
- tdg->tdg_id, tdg->num_roots));
- kmp_node_info_t *this_record_map = tdg->record_map;
- kmp_int32 *this_root_tasks = tdg->root_tasks;
- kmp_int32 this_num_roots = tdg->num_roots;
- kmp_int32 this_num_tasks = KMP_ATOMIC_LD_RLX(&tdg->num_tasks);
- kmp_int32 tasks = 0;
-
+kmp_uint32 __kmpc_taskgraph_task(ident_t *loc_ref, kmp_int32 gtid,
+ kmp_task_t *new_task, kmp_int32 flags,
+ size_t sizeof_kmp_task_t, void *shareds,
+ size_t sizeof_shareds, kmp_int32 ndeps,
+ kmp_depend_info_t *dep_list) {
kmp_info_t *thread = __kmp_threads[gtid];
- kmp_taskdata_t *parent_task = thread->th.th_current_task;
-
- if (tdg->rec_taskred_data) {
- __kmpc_taskred_init(gtid, tdg->rec_num_taskred, tdg->rec_taskred_data);
- }
-
- for (kmp_int32 j = 0; j < tdg->map_size && tasks < this_num_tasks; j++) {
- if (this_record_map[j].task == nullptr)
- continue;
- tasks++;
- kmp_taskdata_t *td = KMP_TASK_TO_TASKDATA(this_record_map[j].task);
-
- td->td_parent = parent_task;
- this_record_map[j].parent_task = parent_task;
-
- kmp_taskgroup_t *parent_taskgroup =
- this_record_map[j].parent_task->td_taskgroup;
-
- KMP_ATOMIC_ST_RLX(&this_record_map[j].npredecessors_counter,
- this_record_map[j].npredecessors);
- KMP_ATOMIC_INC(&this_record_map[j].parent_task->td_incomplete_child_tasks);
-
- if (parent_taskgroup) {
- KMP_ATOMIC_INC(&parent_taskgroup->count);
- // The taskgroup is different so we must update it
- td->td_taskgroup = parent_taskgroup;
- } else if (td->td_taskgroup != nullptr) {
- // If the parent doesnt have a taskgroup, remove it from the task
- td->td_taskgroup = nullptr;
+ kmp_taskgroup_t *taskgroup = thread->th.th_current_task->td_taskgroup;
+ kmp_taskgraph_record_t *rec = __kmp_taskgraph_or_parent_recording(taskgroup);
+
+ if (rec) {
+ kmp_taskgraph_status_t status = KMP_ATOMIC_LD_ACQ(&rec->status);
+ if (status == KMP_TDG_RECORDING) {
+ kmp_task_t *cloned_task =
+ __kmp_taskgraph_clone_task(thread, rec, new_task, sizeof_kmp_task_t,
+ sizeof_shareds);
+ kmp_taskgraph_node_t *node = __kmp_taskgraph_node_alloc(rec, cloned_task);
+ if (taskgroup->taskgraph.reduce_input) {
+ node->reduce_input = taskgroup->taskgraph.reduce_input;
+ taskgroup->taskgraph.reduce_input = nullptr;
+ }
+#if defined(DEBUG_TASKGRAPH)
+ fprintf(stderr, "__kmpc_taskgraph_task: record task here!\n");
+ fprintf(stderr, "private size: %d, shared size: %d\n",
+ (int)(sizeof_kmp_task_t - sizeof(kmp_task_t)), (int)sizeof_shareds);
+ fprintf(stderr, "ndeps: %d\n", (int) ndeps);
+ fprintf(stderr, "gtid: %d rec->gtid: %d\n", gtid, rec->gtid);
+ fprintf(stderr, "taskgroup: %p\n", thread->th.th_current_task->td_taskgroup);
+ kmp_taskdata_t *parent = thread->th.th_current_task->td_parent;
+ while (parent) {
+ fprintf(stderr, " parent: %p (taskgroup %p)\n", parent, parent->td_taskgroup);
+ parent = parent->td_parent;
+ }
+#endif
+ node->u.unresolved.ndeps = ndeps;
+ node->u.unresolved.dep_list =
+ (kmp_depend_info_t *)__kmp_thread_malloc(thread,
+ ndeps * sizeof(kmp_depend_info_t));
+ KMP_MEMCPY(node->u.unresolved.dep_list, dep_list,
+ ndeps * sizeof(kmp_depend_info_t));
+ } else if (status == KMP_TDG_READY) {
+#ifdef DEBUG_TASKGRAPH
+ fprintf(stderr, "non-taskgraph task entry point for task in finalized taskgraph");
+#endif
+ return 0;
+ }
+ } else {
+ kmp_taskdata_t *parent = thread->th.th_current_task->td_parent;
+ while (parent) {
+ parent = parent->td_parent;
}
- if (this_record_map[j].parent_task->td_flags.tasktype == TASK_EXPLICIT)
- KMP_ATOMIC_INC(&this_record_map[j].parent_task->td_allocated_child_tasks);
- }
-
- for (kmp_int32 j = 0; j < this_num_roots; ++j) {
- __kmp_omp_task(gtid, this_record_map[this_root_tasks[j]].task, true);
}
- KA_TRACE(10, ("__kmp_exec_tdg(exit): T#%d tdg_id=%d num_roots=%d\n", gtid,
- tdg->tdg_id, tdg->num_roots));
-}
-
-// __kmp_start_record: set up a TDG structure and turn the
-// recording flag to true
-// gtid: Global Thread ID of the encountering thread
-// input_flags: Flags associated with the TDG
-// tdg_id: ID of the TDG to record
-static inline void __kmp_start_record(kmp_int32 gtid,
- kmp_taskgraph_flags_t *flags,
- kmp_int32 tdg_id) {
- kmp_tdg_info_t *tdg = __kmp_alloc_tdg(tdg_id);
- // Initializing the TDG structure
- tdg->tdg_id = tdg_id;
- tdg->map_size = INIT_MAPSIZE;
- tdg->num_roots = -1;
- tdg->root_tasks = nullptr;
- tdg->tdg_status = KMP_TDG_RECORDING;
- tdg->rec_num_taskred = 0;
- tdg->rec_taskred_data = nullptr;
- KMP_ATOMIC_ST_RLX(&tdg->num_tasks, 0);
-
- // Initializing the list of nodes in this TDG
- kmp_node_info_t *this_record_map =
- (kmp_node_info_t *)__kmp_allocate(INIT_MAPSIZE * sizeof(kmp_node_info_t));
- for (kmp_int32 i = 0; i < INIT_MAPSIZE; i++) {
- this_record_map[i].task = nullptr;
- this_record_map[i].parent_task = nullptr;
- this_record_map[i].successors = nullptr;
- this_record_map[i].nsuccessors = 0;
- this_record_map[i].npredecessors = 0;
- this_record_map[i].successors_size = 0;
- KMP_ATOMIC_ST_RLX(&this_record_map[i].npredecessors_counter, 0);
- }
-
- tdg->record_map = this_record_map;
-}
-// __kmpc_start_record_task: Wrapper around __kmp_start_record to mark
-// the beginning of the record process of a task region
-// loc_ref: Location of TDG, not used yet
-// gtid: Global Thread ID of the encountering thread
-// input_flags: Flags associated with the TDG
-// tdg_id: ID of the TDG to record
-// returns: 1 if we record, otherwise, 0
-kmp_int32 __kmpc_start_record_task(ident_t *loc_ref, kmp_int32 gtid,
- kmp_int32 input_flags, kmp_int32 tdg_id) {
kmp_int32 res;
- kmp_taskgraph_flags_t *flags = (kmp_taskgraph_flags_t *)&input_flags;
- KA_TRACE(10, ("__kmpc_start_record_task(enter): T#%d loc=%p flags=%d "
- "tdg_id=%d\n",
- gtid, loc_ref, input_flags, tdg_id));
-
- if (__kmp_max_tdgs == 0) {
- KA_TRACE(10, ("__kmpc_start_record_task(abandon): T#%d loc=%p flags=%d "
- "tdg_id = %d, __kmp_max_tdgs = 0\n",
- gtid, loc_ref, input_flags, tdg_id));
- return 1;
- }
+ if (ndeps == 0)
+ res = __kmpc_omp_task(loc_ref, gtid, new_task);
+ else
+ res = __kmpc_omp_task_with_deps(loc_ref, gtid, new_task, ndeps, dep_list,
+ 0, nullptr);
- __kmpc_taskgroup(loc_ref, gtid);
- if (flags->graph_reset) {
- __kmp_free_tdg(tdg_id);
- __kmp_num_tdg--;
- }
- if (kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id)) {
- // TODO: use re_record flag
- __kmp_exec_tdg(gtid, tdg);
- res = 0;
- } else {
- KMP_DEBUG_ASSERT(__kmp_num_tdg < __kmp_max_tdgs);
- __kmp_start_record(gtid, flags, tdg_id);
- __kmp_num_tdg++;
- res = 1;
- }
- KA_TRACE(10, ("__kmpc_start_record_task(exit): T#%d TDG %d starts to %s\n",
- gtid, tdg_id, res ? "record" : "execute"));
return res;
}
-// __kmp_end_record: set up a TDG after recording it
-// gtid: Global thread ID
-// tdg: Pointer to the TDG
-void __kmp_end_record(kmp_int32 gtid, kmp_tdg_info_t *tdg) {
- // Store roots
- kmp_node_info_t *this_record_map = tdg->record_map;
- kmp_int32 this_num_tasks = KMP_ATOMIC_LD_RLX(&tdg->num_tasks);
- kmp_int32 *this_root_tasks =
- (kmp_int32 *)__kmp_allocate(this_num_tasks * sizeof(kmp_int32));
- kmp_int32 this_map_size = tdg->map_size;
- kmp_int32 this_num_roots = 0;
+void
+__kmpc_taskgraph_taskwait(ident_t *loc_ref, kmp_int32 gtid, kmp_int32 ndeps,
+ kmp_depend_info_t *dep_list, kmp_int32 has_no_wait) {
kmp_info_t *thread = __kmp_threads[gtid];
- kmp_int32 tasks = 0;
-
- for (kmp_int32 i = 0; tasks < this_num_tasks && i < this_map_size; i++) {
- if (this_record_map[i].task == nullptr) {
- continue;
- }
- tasks++;
- if (this_record_map[i].npredecessors == 0) {
- this_root_tasks[this_num_roots++] = i;
+ kmp_taskgroup_t *taskgroup = thread->th.th_current_task->td_taskgroup;
+ kmp_taskgraph_record_t *rec = __kmp_taskgraph_or_parent_recording(taskgroup);
+
+ if (rec) {
+ kmp_taskgraph_status_t status = KMP_ATOMIC_LD_ACQ(&rec->status);
+ if (status == KMP_TDG_RECORDING) {
+ kmp_taskgraph_node_t *node = __kmp_taskgraph_node_alloc(rec, nullptr);
+#ifdef DEBUG_TASKGRAPH
+ fprintf(stderr, "__kmpc_taskgraph_taskwait: record taskwait here!\n");
+ fprintf(stderr, "ndeps: %d\n", (int) ndeps);
+#endif
+ node->u.unresolved.ndeps = ndeps;
+ node->u.unresolved.dep_list = (kmp_depend_info_t *)__kmp_thread_malloc(thread, ndeps * sizeof(kmp_depend_info_t));
+ KMP_MEMCPY(node->u.unresolved.dep_list, dep_list, ndeps * sizeof(kmp_depend_info_t));
+ // TODO: Record has_no_wait somewhere?
+ //if (has_no_wait)
+ // return;
+ } else if (status == KMP_TDG_READY) {
+#ifdef DEBUG_TASKGRAPH
+ fprintf(stderr, "non-taskgraph taskwait entry point for taskwait in finalized taskgraph\n");
+#endif
+ return;
}
}
- // Update with roots info and mapsize
- tdg->map_size = this_map_size;
- tdg->num_roots = this_num_roots;
- tdg->root_tasks = this_root_tasks;
- KMP_DEBUG_ASSERT(tdg->tdg_status == KMP_TDG_RECORDING);
- tdg->tdg_status = KMP_TDG_READY;
+ __kmpc_omp_taskwait_deps_51(loc_ref, gtid, ndeps, dep_list, 0, nullptr,
+ has_no_wait);
+}
- if (thread->th.th_current_task->td_dephash) {
- __kmp_dephash_free(thread, thread->th.th_current_task->td_dephash);
- thread->th.th_current_task->td_dephash = NULL;
+kmp_uint32
+__kmpc_taskgraph_taskloop(ident_t *loc_ref, kmp_int32 gtid, kmp_task_t *new_task,
+ kmp_int32 flags, size_t sizeof_kmp_task_t,
+ void *shareds, size_t sizeof_shareds,
+ kmp_int32 if_val, kmp_uint64 *lb, kmp_uint64 *ub,
+ kmp_int64 st, kmp_int32 nogroup, kmp_int32 sched,
+ kmp_uint64 grainsize, kmp_int32 modifier,
+ void *task_dup) {
+ kmp_info_t *thread = __kmp_threads[gtid];
+ kmp_taskgroup_t *taskgroup = thread->th.th_current_task->td_taskgroup;
+ kmp_taskgraph_record_t *rec = __kmp_taskgraph_or_parent_recording(taskgroup);
+
+ if (rec) {
+ kmp_taskgraph_status_t status = KMP_ATOMIC_LD_ACQ(&rec->status);
+ if (status == KMP_TDG_RECORDING)
+ __kmp_taskloop(loc_ref, gtid, new_task, if_val, lb, ub, st, nogroup,
+ sched, grainsize, modifier, task_dup, rec);
+ else if (status == KMP_TDG_READY) {
+#ifdef DEBUG_TASKGRAPH
+ fprintf(stderr, "non-taskgraph taskloop entry point for taskloop in finalized taskgraph\n");
+#endif
+ return 0;
+ }
}
- // Reset predecessor counter
- for (kmp_int32 i = 0; i < this_num_tasks; i++) {
- KMP_ATOMIC_ST_RLX(&this_record_map[i].npredecessors_counter,
- this_record_map[i].npredecessors);
- }
+ // For 'nogroup' here, we pass TRUE because this entry point does want to
+ // know if we originally had the 'nogroup' clause or not -- but the group is
+ // created using separate API calls wrapping this one (or __kmpc_taskloop).
+ // We don't want to create another taskgroup in __kmp_taskloop here in any
+ // case.
+ __kmp_taskloop(loc_ref, gtid, new_task, if_val, lb, ub, st, /*nogroup=*/true,
+ sched, grainsize, modifier, task_dup);
- if (__kmp_tdg_dot)
- __kmp_print_tdg_dot(tdg, gtid);
+ return 0;
}
-// __kmpc_end_record_task: wrapper around __kmp_end_record to mark
-// the end of recording phase
-//
-// loc_ref: Source location information
-// gtid: Global thread ID
-// input_flags: Flags attached to the graph
-// tdg_id: ID of the TDG just finished recording
-void __kmpc_end_record_task(ident_t *loc_ref, kmp_int32 gtid,
- kmp_int32 input_flags, kmp_int32 tdg_id) {
- kmp_tdg_info_t *tdg = __kmp_find_tdg(tdg_id);
- kmp_taskgraph_flags_t *flags = (kmp_taskgraph_flags_t *)&input_flags;
-
- KMP_DEBUG_ASSERT(tdg != NULL);
- KA_TRACE(10, ("__kmpc_end_record_task(enter): T#%d loc=%p finishes recording"
- " tdg=%d with flags=%d\n",
- gtid, loc_ref, tdg_id, input_flags));
- if (__kmp_max_tdgs && tdg) {
- if (!flags->nowait)
- __kmpc_end_taskgroup(loc_ref, gtid);
- if (__kmp_tdg_is_recording(tdg->tdg_status))
- __kmp_end_record(gtid, tdg);
- }
- KA_TRACE(10, ("__kmpc_end_record_task(exit): T#%d loc=%p finished recording"
- " tdg=%d, its status is now READY\n",
- gtid, loc_ref, tdg_id));
-}
#endif
>From a409a9bcd565bcce5c2bf64c75ebda615e54e80a Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Wed, 25 Mar 2026 16:56:33 -0500
Subject: [PATCH 26/28] [OpenMP] OpenMP 6.0 "taskgraph" support, frontend parts
---
clang/include/clang/AST/OpenMPClause.h | 73 +-
clang/include/clang/AST/RecursiveASTVisitor.h | 6 +
clang/include/clang/Sema/SemaOpenMP.h | 6 +
clang/lib/AST/OpenMPClause.cpp | 19 +-
clang/lib/AST/StmtProfile.cpp | 5 +
clang/lib/CodeGen/CGOpenMPRuntime.cpp | 668 ++++++++++++------
clang/lib/CodeGen/CGOpenMPRuntime.h | 17 +-
clang/lib/CodeGen/CGStmtOpenMP.cpp | 63 +-
clang/lib/CodeGen/CodeGenFunction.h | 16 +
clang/lib/Parse/ParseOpenMP.cpp | 15 +-
clang/lib/Sema/SemaOpenMP.cpp | 28 +
clang/lib/Sema/TreeTransform.h | 26 +
clang/lib/Serialization/ASTReader.cpp | 11 +
clang/lib/Serialization/ASTWriter.cpp | 5 +
clang/tools/libclang/CIndex.cpp | 4 +
llvm/include/llvm/Frontend/OpenMP/OMP.td | 1 +
.../include/llvm/Frontend/OpenMP/OMPKinds.def | 12 +-
17 files changed, 734 insertions(+), 241 deletions(-)
diff --git a/clang/include/clang/AST/OpenMPClause.h b/clang/include/clang/AST/OpenMPClause.h
index 27a737bd43633..0860aca973516 100644
--- a/clang/include/clang/AST/OpenMPClause.h
+++ b/clang/include/clang/AST/OpenMPClause.h
@@ -1939,6 +1939,75 @@ class OMPSelfMapsClause final : public OMPClause {
}
};
+/// This represents a 'replayable' clause in the '#pragma omp target',
+// '#pragma omp target enter data', '#pragma omp target exit data',
+// '#pragma omp target update', '#pragma omp task', '#pragma omp taskloop' or
+// '#pragma omp taskwait' directive.
+///
+/// \code
+/// #pragma omp task replayable(1)
+/// \endcode
+/// In this example directive '#pragma omp task' has the 'replayable' clause.
+class OMPReplayableClause final : public OMPClause {
+public:
+ friend class OMPClauseReader;
+
+ /// Location of '('.
+ SourceLocation LParenLoc;
+
+ /// Condition of the 'replayable' clause.
+ Stmt *Condition = nullptr;
+
+ /// Set condition.
+ void setCondition(Expr *Cond) { Condition = Cond; }
+
+ /// Build 'replayable' clause.
+ ///
+ /// \param Cond Condition of the clause.
+ /// \param StartLoc Starting location of the clause.
+ /// \param LParenLoc Location of '('.
+ /// \param EndLoc Ending location of the clause.
+ OMPReplayableClause(Expr *Cond, SourceLocation StartLoc,
+ SourceLocation LParenLoc, SourceLocation EndLoc)
+ : OMPClause(llvm::omp::OMPC_replayable, StartLoc, EndLoc),
+ LParenLoc(LParenLoc), Condition(Cond) {}
+
+ /// Build an empty clause.
+ OMPReplayableClause()
+ : OMPClause(llvm::omp::OMPC_replayable, SourceLocation(),
+ SourceLocation()) {}
+
+ /// Sets the location of '('.
+ void setLParenLoc(SourceLocation Loc) { LParenLoc = Loc; }
+
+ /// Returns the location of '('.
+ SourceLocation getLParenLoc() const { return LParenLoc; }
+
+ /// Returns condition.
+ Expr *getCondition() const { return cast_or_null<Expr>(Condition); }
+
+ child_range children() {
+ if (Condition)
+ return child_range(&Condition, &Condition + 1);
+ return child_range(child_iterator(), child_iterator());
+ }
+
+ const_child_range children() const {
+ if (Condition)
+ return const_child_range(&Condition, &Condition + 1);
+ return const_child_range(const_child_iterator(), const_child_iterator());
+ }
+
+ child_range used_children();
+ const_child_range used_children() const {
+ return const_cast<OMPReplayableClause *>(this)->used_children();
+ }
+
+ static bool classof(const OMPClause *T) {
+ return T->getClauseKind() == llvm::omp::OMPC_replayable;
+ }
+};
+
/// This represents 'at' clause in the '#pragma omp error' directive
///
/// \code
@@ -8454,7 +8523,7 @@ class OMPGraphIdClause final
void setCondition(Expr *Cond) { setStmt(Cond); }
public:
- /// Build 'grpah_id' clause with condition \a Cond.
+ /// Build 'graph_id' clause with condition \a Cond.
///
/// \param Cond Condition of the clause.
/// \param HelperCond Helper condition for the construct.
@@ -8498,7 +8567,7 @@ class OMPGraphResetClause final
void setCondition(Expr *Cond) { setStmt(Cond); }
public:
- /// Build 'grpah_id' clause with condition \a Cond.
+ /// Build 'graph_reset' clause with condition \a Cond.
///
/// \param Cond Condition of the clause.
/// \param HelperCond Helper condition for the construct.
diff --git a/clang/include/clang/AST/RecursiveASTVisitor.h b/clang/include/clang/AST/RecursiveASTVisitor.h
index 32b928ca62fd5..c327617c21b74 100644
--- a/clang/include/clang/AST/RecursiveASTVisitor.h
+++ b/clang/include/clang/AST/RecursiveASTVisitor.h
@@ -3634,6 +3634,12 @@ bool RecursiveASTVisitor<Derived>::VisitOMPNowaitClause(OMPNowaitClause *C) {
return true;
}
+template <typename Derived>
+bool RecursiveASTVisitor<Derived>::VisitOMPReplayableClause(OMPReplayableClause *C) {
+ TRY_TO(TraverseStmt(C->getCondition()));
+ return true;
+}
+
template <typename Derived>
bool RecursiveASTVisitor<Derived>::VisitOMPUntiedClause(OMPUntiedClause *) {
return true;
diff --git a/clang/include/clang/Sema/SemaOpenMP.h b/clang/include/clang/Sema/SemaOpenMP.h
index d88a85cc1b9f5..6901740a03df7 100644
--- a/clang/include/clang/Sema/SemaOpenMP.h
+++ b/clang/include/clang/Sema/SemaOpenMP.h
@@ -1161,6 +1161,12 @@ class SemaOpenMP : public SemaBase {
OMPClause *ActOnOpenMPSelfMapsClause(SourceLocation StartLoc,
SourceLocation EndLoc);
+ /// Called on well-formed 'replayable' clause.
+ OMPClause *ActOnOpenMPReplayableClause(SourceLocation StartLoc,
+ SourceLocation EndLoc,
+ SourceLocation LParenLoc,
+ Expr *Condition);
+
/// Called on well-formed 'at' clause.
OMPClause *ActOnOpenMPAtClause(OpenMPAtClauseKind Kind,
SourceLocation KindLoc,
diff --git a/clang/lib/AST/OpenMPClause.cpp b/clang/lib/AST/OpenMPClause.cpp
index 3765b97447e61..a2a04f494fc32 100644
--- a/clang/lib/AST/OpenMPClause.cpp
+++ b/clang/lib/AST/OpenMPClause.cpp
@@ -326,6 +326,12 @@ OMPClause::child_range OMPNowaitClause::used_children() {
return children();
}
+OMPClause::child_range OMPReplayableClause::used_children() {
+ if (Condition)
+ return child_range(&Condition, &Condition + 1);
+ return children();
+}
+
OMPClause::child_range OMPGrainsizeClause::used_children() {
if (Stmt **C = getAddrOfExprAsWritten(getPreInitStmt()))
return child_range(C, C + 1);
@@ -2176,6 +2182,15 @@ void OMPClausePrinter::VisitOMPNowaitClause(OMPNowaitClause *Node) {
}
}
+void OMPClausePrinter::VisitOMPReplayableClause(OMPReplayableClause *Node) {
+ OS << "replayable";
+ if (auto *Cond = Node->getCondition()) {
+ OS << "(";
+ Cond->printPretty(OS, nullptr, Policy, 0);
+ OS << ")";
+ }
+}
+
void OMPClausePrinter::VisitOMPUntiedClause(OMPUntiedClause *) {
OS << "untied";
}
@@ -2353,7 +2368,7 @@ void OMPClausePrinter::VisitOMPGrainsizeClause(OMPGrainsizeClause *Node) {
}
void OMPClausePrinter::VisitOMPGraphIdClause(OMPGraphIdClause *Node) {
- OS << "graphId";
+ OS << "graph_id";
if (Expr *E = Node->getCondition()) {
OS << "(";
E->printPretty(OS, nullptr, Policy, 0);
@@ -2362,7 +2377,7 @@ void OMPClausePrinter::VisitOMPGraphIdClause(OMPGraphIdClause *Node) {
}
void OMPClausePrinter::VisitOMPGraphResetClause(OMPGraphResetClause *Node) {
- OS << "graphReset";
+ OS << "graph_reset";
if (Expr *E = Node->getCondition()) {
OS << "(";
E->printPretty(OS, nullptr, Policy, 0);
diff --git a/clang/lib/AST/StmtProfile.cpp b/clang/lib/AST/StmtProfile.cpp
index 0e43a48e40a9b..11f8f96bfa16b 100644
--- a/clang/lib/AST/StmtProfile.cpp
+++ b/clang/lib/AST/StmtProfile.cpp
@@ -600,6 +600,11 @@ void OMPClauseProfiler::VisitOMPNowaitClause(const OMPNowaitClause *C) {
Profiler->VisitStmt(C->getCondition());
}
+void OMPClauseProfiler::VisitOMPReplayableClause(const OMPReplayableClause *C) {
+ if (C->getCondition())
+ Profiler->VisitStmt(C->getCondition());
+}
+
void OMPClauseProfiler::VisitOMPUntiedClause(const OMPUntiedClause *) {}
void OMPClauseProfiler::VisitOMPMergeableClause(const OMPMergeableClause *) {}
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.cpp b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
index e8a79df3dd5f4..3bc90e40cbd17 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.cpp
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.cpp
@@ -33,6 +33,7 @@
#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/Bitcode/BitcodeReader.h"
+#include "llvm/Frontend/OpenMP/OMP.h.inc"
#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/GlobalValue.h"
@@ -2249,32 +2250,26 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
if (!CGF.HaveInsertPoint())
return;
- // Building kmp_taskgraph_flags_t flags for kmpc_taskgraph. C.f., kmp.h
- enum {
- NowaitFlag = 0x1, // Not used yet.
- ReRecordFlag = 0x2,
- };
-
- unsigned Flags = 0;
-
- if (D.getSingleClause<OMPNogroupClause>()) {
- Flags |= NowaitFlag;
+ // The nogroup clause doesn't support an argument yet. FIXME.
+ const OMPNogroupClause *NoGroupClause =
+ D.getSingleClause<OMPNogroupClause>();
+ llvm::Value *NoGroup;
+ if (NoGroupClause) {
+ NoGroup = CGF.Builder.getInt32(1);
+ } else {
+ NoGroup = CGF.Builder.getInt32(0);
}
const OMPGraphResetClause *GraphResetClause =
D.getSingleClause<OMPGraphResetClause>();
+ llvm::Value *GraphReset;
if (GraphResetClause) {
const Expr *Cond = GraphResetClause->getCondition();
llvm::Value *CondVal = CGF.EvaluateExprAsBool(Cond);
- if (CondVal) {
- llvm::Value *CondBool = CGF.Builder.CreateICmpNE(
- CondVal, llvm::ConstantInt::get(CondVal->getType(), 0));
- if (llvm::ConstantInt *CI = llvm::dyn_cast<llvm::ConstantInt>(CondBool)) {
- if (CI->isOne()) {
- Flags |= ReRecordFlag;
- }
- }
- }
+ GraphReset =
+ CGF.Builder.CreateIntCast(CondVal, CGF.IntTy, /*isSigned=*/true);
+ } else {
+ GraphReset = CGF.Builder.getInt32(0);
}
llvm::Value *GraphId = CGF.Builder.getInt32(0);
@@ -2282,7 +2277,8 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
if (GraphIdClause) {
const auto *E = GraphIdClause->getCondition();
auto *GraphIdVal = CGF.EmitScalarExpr(E);
- GraphId = CGF.Builder.CreateIntCast(GraphIdVal, CGM.Int32Ty, true);
+ GraphId =
+ CGF.Builder.CreateIntCast(GraphIdVal, CGM.Int32Ty, /*isSigned=*/false);
}
CodeGenFunction OutlinedCGF(CGM, /*suppressNewContext=*/true);
@@ -2290,6 +2286,7 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
const auto *CS = cast<CapturedStmt>(D.getAssociatedStmt());
auto BodyGen = [CS](CodeGenFunction &CGF, PrePostActionTy &) {
+ CodeGenFunction::OMPWithinTaskgraphRAII WithinTaskgraph(CGF);
CGF.EmitStmt(CS->getCapturedStmt());
};
@@ -2297,14 +2294,25 @@ void CGOpenMPRuntime::emitTaskgraphCall(CodeGenFunction &CGF,
CGOpenMPTaskgraphRegionInfo TaskgraphRegion(*CS, BodyGen);
CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(OutlinedCGF,
&TaskgraphRegion);
+
llvm::Function *FnT = OutlinedCGF.GenerateCapturedStmtFunction(*CS);
- std::array<llvm::Value *, 7> Args{
+ // Create an internal-linkage global variable to hold the taskgraph handle.
+ std::string GraphHandleName = getName({"omp", "taskgraph", "handle"});
+ auto *GraphHandle =
+ new llvm::GlobalVariable(CGM.getModule(), CGM.VoidPtrTy,
+ /*IsConstant=*/false,
+ llvm::GlobalValue::InternalLinkage,
+ llvm::Constant::getNullValue(CGM.VoidPtrTy),
+ GraphHandleName);
+
+ std::array<llvm::Value *, 8> Args{
emitUpdateLocation(CGF, Loc),
getThreadID(CGF, Loc),
- CGF.Builder.getInt32(Flags),
- CGF.Builder.getInt32(D.getBeginLoc().getHashValue()),
+ GraphHandle,
GraphId,
+ GraphReset,
+ NoGroup,
CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(FnT, CGM.VoidPtrTy),
CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
CapStruct.getPointer(OutlinedCGF), CGM.VoidPtrTy)};
@@ -3874,7 +3882,9 @@ CGOpenMPRuntime::TaskResultTy
CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc,
const OMPExecutableDirective &D,
llvm::Function *TaskFunction, QualType SharedsTy,
- Address Shareds, const OMPTaskDataTy &Data) {
+ Address Shareds, const OMPTaskDataTy &Data,
+ bool ForTaskgraph,
+ std::array<llvm::Value *, 3> &TaskAllocArgs) {
ASTContext &C = CGM.getContext();
llvm::SmallVector<PrivateDataTy, 4> Privates;
// Aggregate privates and sort them by the alignment.
@@ -4021,6 +4031,11 @@ CGOpenMPRuntime::emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc,
SharedsSize, CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
TaskEntry, KmpRoutineEntryPtrTy)};
llvm::Value *NewTask;
+ if (ForTaskgraph) {
+ TaskAllocArgs[0] = TaskFlags;
+ TaskAllocArgs[1] = KmpTaskTWithPrivatesTySize;
+ TaskAllocArgs[2] = SharedsSize;
+ }
if (D.hasClausesOfKind<OMPNowaitClause>()) {
// Check if we have any device clause associated with the directive.
const Expr *Device = nullptr;
@@ -4774,118 +4789,183 @@ void CGOpenMPRuntime::emitTaskCall(CodeGenFunction &CGF, SourceLocation Loc,
llvm::Function *TaskFunction,
QualType SharedsTy, Address Shareds,
const Expr *IfCond,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data) {
if (!CGF.HaveInsertPoint())
return;
- TaskResultTy Result =
- emitTaskInit(CGF, Loc, D, TaskFunction, SharedsTy, Shareds, Data);
- llvm::Value *NewTask = Result.NewTask;
- llvm::Function *TaskEntry = Result.TaskEntry;
- llvm::Value *NewTaskNewTaskTTy = Result.NewTaskNewTaskTTy;
- LValue TDBase = Result.TDBase;
- const RecordDecl *KmpTaskTQTyRD = Result.KmpTaskTQTyRD;
- // Process list of dependences.
- Address DependenciesArray = Address::invalid();
- llvm::Value *NumOfElements;
- std::tie(NumOfElements, DependenciesArray) =
- emitDependClause(CGF, Data.Dependences, Loc);
-
- // NOTE: routine and part_id fields are initialized by __kmpc_omp_task_alloc()
- // libcall.
- // Build kmp_int32 __kmpc_omp_task_with_deps(ident_t *, kmp_int32 gtid,
- // kmp_task_t *new_task, kmp_int32 ndeps, kmp_depend_info_t *dep_list,
- // kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list) if dependence
- // list is not empty
- llvm::Value *ThreadID = getThreadID(CGF, Loc);
- llvm::Value *UpLoc = emitUpdateLocation(CGF, Loc);
- llvm::Value *TaskArgs[] = { UpLoc, ThreadID, NewTask };
- llvm::Value *DepTaskArgs[7];
- if (!Data.Dependences.empty()) {
- DepTaskArgs[0] = UpLoc;
- DepTaskArgs[1] = ThreadID;
- DepTaskArgs[2] = NewTask;
- DepTaskArgs[3] = NumOfElements;
- DepTaskArgs[4] = DependenciesArray.emitRawPointer(CGF);
- DepTaskArgs[5] = CGF.Builder.getInt32(0);
- DepTaskArgs[6] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
- }
- auto &&ThenCodeGen = [this, &Data, TDBase, KmpTaskTQTyRD, &TaskArgs,
- &DepTaskArgs](CodeGenFunction &CGF, PrePostActionTy &) {
- if (!Data.Tied) {
- auto PartIdFI = std::next(KmpTaskTQTyRD->field_begin(), KmpTaskTPartId);
- LValue PartIdLVal = CGF.EmitLValueForField(TDBase, *PartIdFI);
- CGF.EmitStoreOfScalar(CGF.Builder.getInt32(0), PartIdLVal);
+ auto &&TaskgraphTaskCodeGen =
+ [this, &Loc, &D, TaskFunction, &SharedsTy, &Shareds, &Data]
+ (CodeGenFunction &CGF, PrePostActionTy &) {
+ llvm::Value *ThreadId = getThreadID(CGF, Loc);
+ llvm::Value *UpLoc = emitUpdateLocation(CGF, Loc);
+ std::array<llvm::Value *, 9> TGTaskArgs;
+ std::array<llvm::Value *, 3> TaskAllocArgs;
+ TaskResultTy Result =
+ emitTaskInit(CGF, Loc, D, TaskFunction, SharedsTy, Shareds, Data, true,
+ TaskAllocArgs);
+ Address DependenciesArray = Address::invalid();
+ llvm::Value *NumOfElements;
+ std::tie(NumOfElements, DependenciesArray) =
+ emitDependClause(CGF, Data.Dependences, Loc);
+ //llvm::dbgs() << "SharedsTy:\n";
+ TGTaskArgs[0] = UpLoc;
+ TGTaskArgs[1] = ThreadId;
+ TGTaskArgs[2] = Result.NewTask;
+ //TGTaskArgs[2] = TaskgraphRegion->getTaskgraphValue();
+ TGTaskArgs[3] = TaskAllocArgs[0]; // TaskFlags
+ TGTaskArgs[4] = TaskAllocArgs[1]; // KmpTaskTWithPrivatesTySize
+ TGTaskArgs[5] = Shareds.emitRawPointer(CGF);
+ TGTaskArgs[6] = TaskAllocArgs[2]; // SharedsSize
+ if (auto RecType = dyn_cast<RecordType>(SharedsTy)) {
+ auto *RD = RecType->getAsRecordDecl();
+ if (RD->fields().empty()) {
+ // FIXME: The condition might not be precisely correct here.
+ TGTaskArgs[6] = CGF.Builder.getSize(0);
+ }
}
- if (!Data.Dependences.empty()) {
- CGF.EmitRuntimeCall(
- OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_omp_task_with_deps),
- DepTaskArgs);
+ if (Data.Dependences.size() == 0) {
+ TGTaskArgs[7] = CGF.Builder.getInt32(0);
+ TGTaskArgs[8] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
} else {
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_omp_task),
- TaskArgs);
+ TGTaskArgs[7] = NumOfElements;
+ TGTaskArgs[8] = DependenciesArray.emitRawPointer(CGF);
}
- // Check if parent region is untied and build return for untied task;
- if (auto *Region =
- dyn_cast_or_null<CGOpenMPRegionInfo>(CGF.CapturedStmtInfo))
- Region->emitUntiedSwitch(CGF);
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph_task),
+ TGTaskArgs);
};
- llvm::Value *DepWaitTaskArgs[7];
- if (!Data.Dependences.empty()) {
- DepWaitTaskArgs[0] = UpLoc;
- DepWaitTaskArgs[1] = ThreadID;
- DepWaitTaskArgs[2] = NumOfElements;
- DepWaitTaskArgs[3] = DependenciesArray.emitRawPointer(CGF);
- DepWaitTaskArgs[4] = CGF.Builder.getInt32(0);
- DepWaitTaskArgs[5] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
- DepWaitTaskArgs[6] =
- llvm::ConstantInt::get(CGF.Int32Ty, Data.HasNowaitClause);
- }
- auto &M = CGM.getModule();
- auto &&ElseCodeGen = [this, &M, &TaskArgs, ThreadID, NewTaskNewTaskTTy,
- TaskEntry, &Data, &DepWaitTaskArgs,
- Loc](CodeGenFunction &CGF, PrePostActionTy &) {
- CodeGenFunction::RunCleanupsScope LocalScope(CGF);
- // Build void __kmpc_omp_wait_deps(ident_t *, kmp_int32 gtid,
- // kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32
- // ndeps_noalias, kmp_depend_info_t *noalias_dep_list); if dependence info
- // is specified.
- if (!Data.Dependences.empty())
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- M, OMPRTL___kmpc_omp_taskwait_deps_51),
- DepWaitTaskArgs);
- // Call proxy_task_entry(gtid, new_task);
- auto &&CodeGen = [TaskEntry, ThreadID, NewTaskNewTaskTTy,
- Loc](CodeGenFunction &CGF, PrePostActionTy &Action) {
- Action.Enter(CGF);
- llvm::Value *OutlinedFnArgs[] = {ThreadID, NewTaskNewTaskTTy};
- CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc, TaskEntry,
- OutlinedFnArgs);
+ auto &&NonTaskgraphTaskCodeGen =
+ [this, &Loc, &D, TaskFunction, &SharedsTy, &Shareds, IfCond, &Data]
+ (CodeGenFunction &CGF, PrePostActionTy &) {
+ std::array<llvm::Value*, 3> DummyArray;
+ TaskResultTy Result =
+ emitTaskInit(CGF, Loc, D, TaskFunction, SharedsTy, Shareds, Data, false, DummyArray);
+ llvm::Value *NewTask = Result.NewTask;
+ llvm::Function *TaskEntry = Result.TaskEntry;
+ llvm::Value *NewTaskNewTaskTTy = Result.NewTaskNewTaskTTy;
+ LValue TDBase = Result.TDBase;
+ const RecordDecl *KmpTaskTQTyRD = Result.KmpTaskTQTyRD;
+ // Process list of dependences.
+ Address DependenciesArray = Address::invalid();
+ llvm::Value *NumOfElements;
+ std::tie(NumOfElements, DependenciesArray) =
+ emitDependClause(CGF, Data.Dependences, Loc);
+
+ // NOTE: routine and part_id fields are initialized by __kmpc_omp_task_alloc()
+ // libcall.
+ // Build kmp_int32 __kmpc_omp_task_with_deps(ident_t *, kmp_int32 gtid,
+ // kmp_task_t *new_task, kmp_int32 ndeps, kmp_depend_info_t *dep_list,
+ // kmp_int32 ndeps_noalias, kmp_depend_info_t *noalias_dep_list) if dependence
+ // list is not empty
+ llvm::Value *ThreadID = getThreadID(CGF, Loc);
+ llvm::Value *UpLoc = emitUpdateLocation(CGF, Loc);
+ llvm::Value *TaskArgs[] = { UpLoc, ThreadID, NewTask };
+ llvm::Value *DepTaskArgs[7];
+ if (!Data.Dependences.empty()) {
+ DepTaskArgs[0] = UpLoc;
+ DepTaskArgs[1] = ThreadID;
+ DepTaskArgs[2] = NewTask;
+ DepTaskArgs[3] = NumOfElements;
+ DepTaskArgs[4] = DependenciesArray.emitRawPointer(CGF);
+ DepTaskArgs[5] = CGF.Builder.getInt32(0);
+ DepTaskArgs[6] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
+ }
+ auto &&ThenCodeGen = [this, &Data, TDBase, KmpTaskTQTyRD, &TaskArgs,
+ &DepTaskArgs](CodeGenFunction &CGF, PrePostActionTy &) {
+ if (!Data.Tied) {
+ auto PartIdFI = std::next(KmpTaskTQTyRD->field_begin(), KmpTaskTPartId);
+ LValue PartIdLVal = CGF.EmitLValueForField(TDBase, *PartIdFI);
+ CGF.EmitStoreOfScalar(CGF.Builder.getInt32(0), PartIdLVal);
+ }
+ if (!Data.Dependences.empty()) {
+ CGF.EmitRuntimeCall(
+ OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_omp_task_with_deps),
+ DepTaskArgs);
+ } else {
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_omp_task),
+ TaskArgs);
+ }
+ // Check if parent region is untied and build return for untied task;
+ if (auto *Region =
+ dyn_cast_or_null<CGOpenMPRegionInfo>(CGF.CapturedStmtInfo))
+ Region->emitUntiedSwitch(CGF);
};
- // Build void __kmpc_omp_task_begin_if0(ident_t *, kmp_int32 gtid,
- // kmp_task_t *new_task);
- // Build void __kmpc_omp_task_complete_if0(ident_t *, kmp_int32 gtid,
- // kmp_task_t *new_task);
- RegionCodeGenTy RCG(CodeGen);
- CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction(
- M, OMPRTL___kmpc_omp_task_begin_if0),
- TaskArgs,
- OMPBuilder.getOrCreateRuntimeFunction(
- M, OMPRTL___kmpc_omp_task_complete_if0),
- TaskArgs);
- RCG.setAction(Action);
- RCG(CGF);
+ llvm::Value *DepWaitTaskArgs[7];
+ if (!Data.Dependences.empty()) {
+ DepWaitTaskArgs[0] = UpLoc;
+ DepWaitTaskArgs[1] = ThreadID;
+ DepWaitTaskArgs[2] = NumOfElements;
+ DepWaitTaskArgs[3] = DependenciesArray.emitRawPointer(CGF);
+ DepWaitTaskArgs[4] = CGF.Builder.getInt32(0);
+ DepWaitTaskArgs[5] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
+ DepWaitTaskArgs[6] =
+ llvm::ConstantInt::get(CGF.Int32Ty, Data.HasNowaitClause);
+ }
+ auto &M = CGM.getModule();
+ auto &&ElseCodeGen = [this, &M, &TaskArgs, ThreadID, NewTaskNewTaskTTy,
+ TaskEntry, &Data, &DepWaitTaskArgs,
+ Loc](CodeGenFunction &CGF, PrePostActionTy &) {
+ CodeGenFunction::RunCleanupsScope LocalScope(CGF);
+ // Build void __kmpc_omp_wait_deps(ident_t *, kmp_int32 gtid,
+ // kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32
+ // ndeps_noalias, kmp_depend_info_t *noalias_dep_list); if dependence info
+ // is specified.
+ if (!Data.Dependences.empty())
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ M, OMPRTL___kmpc_omp_taskwait_deps_51),
+ DepWaitTaskArgs);
+ // Call proxy_task_entry(gtid, new_task);
+ auto &&CodeGen = [TaskEntry, ThreadID, NewTaskNewTaskTTy,
+ Loc](CodeGenFunction &CGF, PrePostActionTy &Action) {
+ Action.Enter(CGF);
+ llvm::Value *OutlinedFnArgs[] = {ThreadID, NewTaskNewTaskTTy};
+ CGF.CGM.getOpenMPRuntime().emitOutlinedFunctionCall(CGF, Loc, TaskEntry,
+ OutlinedFnArgs);
+ };
+
+ // Build void __kmpc_omp_task_begin_if0(ident_t *, kmp_int32 gtid,
+ // kmp_task_t *new_task);
+ // Build void __kmpc_omp_task_complete_if0(ident_t *, kmp_int32 gtid,
+ // kmp_task_t *new_task);
+ RegionCodeGenTy RCG(CodeGen);
+ CommonActionTy Action(OMPBuilder.getOrCreateRuntimeFunction(
+ M, OMPRTL___kmpc_omp_task_begin_if0),
+ TaskArgs,
+ OMPBuilder.getOrCreateRuntimeFunction(
+ M, OMPRTL___kmpc_omp_task_complete_if0),
+ TaskArgs);
+ RCG.setAction(Action);
+ RCG(CGF);
+ };
+
+ if (IfCond) {
+ emitIfClause(CGF, IfCond, ThenCodeGen, ElseCodeGen);
+ } else {
+ RegionCodeGenTy ThenRCG(ThenCodeGen);
+ ThenRCG(CGF);
+ }
};
- if (IfCond) {
- emitIfClause(CGF, IfCond, ThenCodeGen, ElseCodeGen);
+ if (CGF.getOMPWithinTaskgraph()) {
+ // Lexically within taskgraph, always replayable.
+ RegionCodeGenTy TaskgraphRCG(TaskgraphTaskCodeGen);
+ TaskgraphRCG(CGF);
} else {
- RegionCodeGenTy ThenRCG(ThenCodeGen);
- ThenRCG(CGF);
+ if (ReplayableCond) {
+ // We have a replayable clause. Task is replayable if its argument is
+ // omitted or evaluates to TRUE.
+ emitIfClause(CGF, ReplayableCond, TaskgraphTaskCodeGen,
+ NonTaskgraphTaskCodeGen);
+ } else {
+ // Not taskgraph, not replayable.
+ RegionCodeGenTy NonTaskgraphRCG(NonTaskgraphTaskCodeGen);
+ NonTaskgraphRCG(CGF);
+ }
}
}
@@ -4894,18 +4974,11 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
llvm::Function *TaskFunction,
QualType SharedsTy, Address Shareds,
const Expr *IfCond,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data) {
if (!CGF.HaveInsertPoint())
return;
- TaskResultTy Result =
- emitTaskInit(CGF, Loc, D, TaskFunction, SharedsTy, Shareds, Data);
- // NOTE: routine and part_id fields are initialized by __kmpc_omp_task_alloc()
- // libcall.
- // Call to void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int
- // if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int
- // sched, kmp_uint64 grainsize, void *task_dup);
- llvm::Value *ThreadID = getThreadID(CGF, Loc);
- llvm::Value *UpLoc = emitUpdateLocation(CGF, Loc);
+
llvm::Value *IfVal;
if (IfCond) {
IfVal = CGF.Builder.CreateIntCast(CGF.EvaluateExprAsBool(IfCond), CGF.IntTy,
@@ -4914,68 +4987,173 @@ void CGOpenMPRuntime::emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
IfVal = llvm::ConstantInt::getSigned(CGF.IntTy, /*V=*/1);
}
- LValue LBLVal = CGF.EmitLValueForField(
- Result.TDBase,
- *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTLowerBound));
- const auto *LBVar =
- cast<VarDecl>(cast<DeclRefExpr>(D.getLowerBoundVariable())->getDecl());
- CGF.EmitAnyExprToMem(LBVar->getInit(), LBLVal.getAddress(), LBLVal.getQuals(),
- /*IsInitializer=*/true);
- LValue UBLVal = CGF.EmitLValueForField(
- Result.TDBase,
- *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTUpperBound));
- const auto *UBVar =
- cast<VarDecl>(cast<DeclRefExpr>(D.getUpperBoundVariable())->getDecl());
- CGF.EmitAnyExprToMem(UBVar->getInit(), UBLVal.getAddress(), UBLVal.getQuals(),
- /*IsInitializer=*/true);
- LValue StLVal = CGF.EmitLValueForField(
- Result.TDBase,
- *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTStride));
- const auto *StVar =
- cast<VarDecl>(cast<DeclRefExpr>(D.getStrideVariable())->getDecl());
- CGF.EmitAnyExprToMem(StVar->getInit(), StLVal.getAddress(), StLVal.getQuals(),
- /*IsInitializer=*/true);
- // Store reductions address.
- LValue RedLVal = CGF.EmitLValueForField(
- Result.TDBase,
- *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTReductions));
- if (Data.Reductions) {
- CGF.EmitStoreOfScalar(Data.Reductions, RedLVal);
+ enum { NoSchedule = 0, Grainsize = 1, NumTasks = 2 };
+
+ auto &&TaskgraphTaskloopCodeGen =
+ [this, &Loc, &D, TaskFunction, &SharedsTy, &Shareds, IfVal, &Data]
+ (CodeGenFunction &CGF, PrePostActionTy &) {
+ llvm::Value *ThreadId = getThreadID(CGF, Loc);
+ llvm::Value *UpLoc = emitUpdateLocation(CGF, Loc);
+ std::array<llvm::Value *, 16> TGTaskLoopArgs;
+ std::array<llvm::Value *, 3> TaskAllocArgs;
+ TaskResultTy Result =
+ emitTaskInit(CGF, Loc, D, TaskFunction, SharedsTy, Shareds, Data, true,
+ TaskAllocArgs);
+
+ // This is all copy/pasted from below. Refactor!
+ LValue LBLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTLowerBound));
+ const auto *LBVar =
+ cast<VarDecl>(cast<DeclRefExpr>(D.getLowerBoundVariable())->getDecl());
+ CGF.EmitAnyExprToMem(LBVar->getInit(), LBLVal.getAddress(), LBLVal.getQuals(),
+ /*IsInitializer=*/true);
+ LValue UBLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTUpperBound));
+ const auto *UBVar =
+ cast<VarDecl>(cast<DeclRefExpr>(D.getUpperBoundVariable())->getDecl());
+ CGF.EmitAnyExprToMem(UBVar->getInit(), UBLVal.getAddress(), UBLVal.getQuals(),
+ /*IsInitializer=*/true);
+ LValue StLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTStride));
+ const auto *StVar =
+ cast<VarDecl>(cast<DeclRefExpr>(D.getStrideVariable())->getDecl());
+ CGF.EmitAnyExprToMem(StVar->getInit(), StLVal.getAddress(), StLVal.getQuals(),
+ /*IsInitializer=*/true);
+ // Store reductions address.
+ LValue RedLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTReductions));
+ if (Data.Reductions) {
+ CGF.EmitStoreOfScalar(Data.Reductions, RedLVal);
+ } else {
+ CGF.EmitNullInitialization(RedLVal.getAddress(),
+ CGF.getContext().VoidPtrTy);
+ }
+
+ TGTaskLoopArgs[0] = UpLoc;
+ TGTaskLoopArgs[1] = ThreadId;
+ TGTaskLoopArgs[2] = Result.NewTask;
+ TGTaskLoopArgs[3] = TaskAllocArgs[0]; // TaskFlags
+ TGTaskLoopArgs[4] = TaskAllocArgs[1]; // KmpTaskTWithPrivatesTySize
+ TGTaskLoopArgs[5] = Shareds.emitRawPointer(CGF);
+ TGTaskLoopArgs[6] = TaskAllocArgs[2]; // SharedsSize
+ TGTaskLoopArgs[7] = IfVal;
+ TGTaskLoopArgs[8] = LBLVal.getPointer(CGF);
+ TGTaskLoopArgs[9] = UBLVal.getPointer(CGF);
+ TGTaskLoopArgs[10] = CGF.EmitLoadOfScalar(StLVal, Loc);
+ TGTaskLoopArgs[11] = llvm::ConstantInt::getSigned(CGF.IntTy, Data.Nogroup ? 1 : 0);
+ TGTaskLoopArgs[12] = llvm::ConstantInt::getSigned(CGF.IntTy, Data.Schedule.getPointer()
+ ? Data.Schedule.getInt() ? NumTasks : Grainsize
+ : NoSchedule);
+ TGTaskLoopArgs[13] = Data.Schedule.getPointer()
+ ? CGF.Builder.CreateIntCast(Data.Schedule.getPointer(), CGF.Int64Ty, /*isSigned=*/false)
+ : llvm::ConstantInt::get(CGF.Int64Ty, /*V=*/0);
+ TGTaskLoopArgs[14] = llvm::ConstantInt::getSigned(CGF.IntTy, Data.HasModifier ? 1 : 0);
+ TGTaskLoopArgs[15] = Result.TaskDupFn
+ ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+ Result.TaskDupFn, CGF.VoidPtrTy)
+ : llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph_taskloop),
+ TGTaskLoopArgs);
+ };
+
+ auto &&NonTaskgraphTaskloopCodeGen =
+ [this, &Loc, &D, TaskFunction, &SharedsTy, &Shareds, IfVal, &Data]
+ (CodeGenFunction &CGF, PrePostActionTy &) {
+ std::array<llvm::Value*, 3> DummyArray;
+ TaskResultTy Result =
+ emitTaskInit(CGF, Loc, D, TaskFunction, SharedsTy, Shareds, Data, false, DummyArray);
+ // NOTE: routine and part_id fields are initialized by __kmpc_omp_task_alloc()
+ // libcall.
+ // Call to void __kmpc_taskloop(ident_t *loc, int gtid, kmp_task_t *task, int
+ // if_val, kmp_uint64 *lb, kmp_uint64 *ub, kmp_int64 st, int nogroup, int
+ // sched, kmp_uint64 grainsize, void *task_dup);
+ llvm::Value *ThreadID = getThreadID(CGF, Loc);
+ llvm::Value *UpLoc = emitUpdateLocation(CGF, Loc);
+
+ LValue LBLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTLowerBound));
+ const auto *LBVar =
+ cast<VarDecl>(cast<DeclRefExpr>(D.getLowerBoundVariable())->getDecl());
+ CGF.EmitAnyExprToMem(LBVar->getInit(), LBLVal.getAddress(), LBLVal.getQuals(),
+ /*IsInitializer=*/true);
+ LValue UBLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTUpperBound));
+ const auto *UBVar =
+ cast<VarDecl>(cast<DeclRefExpr>(D.getUpperBoundVariable())->getDecl());
+ CGF.EmitAnyExprToMem(UBVar->getInit(), UBLVal.getAddress(), UBLVal.getQuals(),
+ /*IsInitializer=*/true);
+ LValue StLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTStride));
+ const auto *StVar =
+ cast<VarDecl>(cast<DeclRefExpr>(D.getStrideVariable())->getDecl());
+ CGF.EmitAnyExprToMem(StVar->getInit(), StLVal.getAddress(), StLVal.getQuals(),
+ /*IsInitializer=*/true);
+ // Store reductions address.
+ LValue RedLVal = CGF.EmitLValueForField(
+ Result.TDBase,
+ *std::next(Result.KmpTaskTQTyRD->field_begin(), KmpTaskTReductions));
+ if (Data.Reductions) {
+ CGF.EmitStoreOfScalar(Data.Reductions, RedLVal);
+ } else {
+ CGF.EmitNullInitialization(RedLVal.getAddress(),
+ CGF.getContext().VoidPtrTy);
+ }
+ llvm::SmallVector<llvm::Value *, 12> TaskArgs{
+ UpLoc,
+ ThreadID,
+ Result.NewTask,
+ IfVal,
+ LBLVal.getPointer(CGF),
+ UBLVal.getPointer(CGF),
+ CGF.EmitLoadOfScalar(StLVal, Loc),
+ llvm::ConstantInt::getSigned(
+ CGF.IntTy, 1), // Always 1 because taskgroup emitted by the compiler
+ llvm::ConstantInt::getSigned(
+ CGF.IntTy, Data.Schedule.getPointer()
+ ? Data.Schedule.getInt() ? NumTasks : Grainsize
+ : NoSchedule),
+ Data.Schedule.getPointer()
+ ? CGF.Builder.CreateIntCast(Data.Schedule.getPointer(), CGF.Int64Ty,
+ /*isSigned=*/false)
+ : llvm::ConstantInt::get(CGF.Int64Ty, /*V=*/0)};
+ if (Data.HasModifier)
+ TaskArgs.push_back(llvm::ConstantInt::get(CGF.Int32Ty, 1));
+
+ TaskArgs.push_back(Result.TaskDupFn
+ ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
+ Result.TaskDupFn, CGF.VoidPtrTy)
+ : llvm::ConstantPointerNull::get(CGF.VoidPtrTy));
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), Data.HasModifier
+ ? OMPRTL___kmpc_taskloop_5
+ : OMPRTL___kmpc_taskloop),
+ TaskArgs);
+ };
+
+ if (CGF.getOMPWithinTaskgraph()) {
+ // Lexically within taskgraph, always replayable.
+ RegionCodeGenTy TaskgraphRCG(TaskgraphTaskloopCodeGen);
+ TaskgraphRCG(CGF);
} else {
- CGF.EmitNullInitialization(RedLVal.getAddress(),
- CGF.getContext().VoidPtrTy);
+ if (ReplayableCond) {
+ // We have a replayable clause. Taskloop is replayable if its argument
+ // is omitted or evaluates to TRUE.
+ emitIfClause(CGF, ReplayableCond, TaskgraphTaskloopCodeGen,
+ NonTaskgraphTaskloopCodeGen);
+ } else {
+ // Not taskgraph, not replayable.
+ RegionCodeGenTy NonTaskgraphRCG(NonTaskgraphTaskloopCodeGen);
+ NonTaskgraphRCG(CGF);
+ }
}
- enum { NoSchedule = 0, Grainsize = 1, NumTasks = 2 };
- llvm::SmallVector<llvm::Value *, 12> TaskArgs{
- UpLoc,
- ThreadID,
- Result.NewTask,
- IfVal,
- LBLVal.getPointer(CGF),
- UBLVal.getPointer(CGF),
- CGF.EmitLoadOfScalar(StLVal, Loc),
- llvm::ConstantInt::getSigned(
- CGF.IntTy, 1), // Always 1 because taskgroup emitted by the compiler
- llvm::ConstantInt::getSigned(
- CGF.IntTy, Data.Schedule.getPointer()
- ? Data.Schedule.getInt() ? NumTasks : Grainsize
- : NoSchedule),
- Data.Schedule.getPointer()
- ? CGF.Builder.CreateIntCast(Data.Schedule.getPointer(), CGF.Int64Ty,
- /*isSigned=*/false)
- : llvm::ConstantInt::get(CGF.Int64Ty, /*V=*/0)};
- if (Data.HasModifier)
- TaskArgs.push_back(llvm::ConstantInt::get(CGF.Int32Ty, 1));
-
- TaskArgs.push_back(Result.TaskDupFn
- ? CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(
- Result.TaskDupFn, CGF.VoidPtrTy)
- : llvm::ConstantPointerNull::get(CGF.VoidPtrTy));
- CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), Data.HasModifier
- ? OMPRTL___kmpc_taskloop_5
- : OMPRTL___kmpc_taskloop),
- TaskArgs);
}
/// Emit reduction operation for each element of array (required for
@@ -6105,9 +6283,15 @@ llvm::Value *CGOpenMPRuntime::emitTaskReductionInit(
llvm::ConstantInt::get(CGM.IntTy, Size, /*isSigned=*/true),
CGF.Builder.CreatePointerBitCastOrAddrSpaceCast(TaskRedInput.getPointer(),
CGM.VoidPtrTy)};
- return CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- CGM.getModule(), OMPRTL___kmpc_taskred_init),
- Args);
+ if (CGF.getOMPWithinTaskgraph())
+ return CGF.EmitRuntimeCall(
+ OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskgraph_taskred_init),
+ Args);
+ else
+ return CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ CGM.getModule(), OMPRTL___kmpc_taskred_init),
+ Args);
}
void CGOpenMPRuntime::emitTaskReductionFini(CodeGenFunction &CGF,
@@ -6166,6 +6350,7 @@ Address CGOpenMPRuntime::getTaskReductionItem(CodeGenFunction &CGF,
}
void CGOpenMPRuntime::emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data) {
if (!CGF.HaveInsertPoint())
return;
@@ -6181,36 +6366,75 @@ void CGOpenMPRuntime::emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
llvm::Value *NumOfElements;
std::tie(NumOfElements, DependenciesArray) =
emitDependClause(CGF, Data.Dependences, Loc);
- if (!Data.Dependences.empty()) {
- llvm::Value *DepWaitTaskArgs[7];
- DepWaitTaskArgs[0] = UpLoc;
- DepWaitTaskArgs[1] = ThreadID;
- DepWaitTaskArgs[2] = NumOfElements;
- DepWaitTaskArgs[3] = DependenciesArray.emitRawPointer(CGF);
- DepWaitTaskArgs[4] = CGF.Builder.getInt32(0);
- DepWaitTaskArgs[5] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
- DepWaitTaskArgs[6] =
- llvm::ConstantInt::get(CGF.Int32Ty, Data.HasNowaitClause);
- CodeGenFunction::RunCleanupsScope LocalScope(CGF);
-
- // Build void __kmpc_omp_taskwait_deps_51(ident_t *, kmp_int32 gtid,
- // kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32
- // ndeps_noalias, kmp_depend_info_t *noalias_dep_list,
- // kmp_int32 has_no_wait); if dependence info is specified.
+ auto &&TaskgraphTaskwaitCodeGen =
+ [this, UpLoc, ThreadID, NumOfElements, &DependenciesArray, &Data]
+ (CodeGenFunction &CGF, PrePostActionTy &) {
+ llvm::Value *TGTaskWaitArgs[5];
+ TGTaskWaitArgs[0] = UpLoc;
+ TGTaskWaitArgs[1] = ThreadID;
+ TGTaskWaitArgs[2] = NumOfElements;
+ if (Data.Dependences.empty()) {
+ // This should be a proper error
+ fprintf(stderr, "*** Taskwait inside taskgraph with no depend clause is not task-generating\n");
+ exit(1);
+ }
+ TGTaskWaitArgs[3] = DependenciesArray.emitRawPointer(CGF);
+ TGTaskWaitArgs[4] =
+ llvm::ConstantInt::get(CGF.Int32Ty, Data.HasNowaitClause);
CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
- M, OMPRTL___kmpc_omp_taskwait_deps_51),
- DepWaitTaskArgs);
+ CGM.getModule(), OMPRTL___kmpc_taskgraph_taskwait),
+ TGTaskWaitArgs);
+ };
+ auto &&NonTaskgraphTaskwaitCodeGen =
+ [this, UpLoc, ThreadID, NumOfElements, &DependenciesArray, &M, &Data]
+ (CodeGenFunction &CGF, PrePostActionTy &) {
+ if (!Data.Dependences.empty()) {
+ llvm::Value *DepWaitTaskArgs[7];
+ DepWaitTaskArgs[0] = UpLoc;
+ DepWaitTaskArgs[1] = ThreadID;
+ DepWaitTaskArgs[2] = NumOfElements;
+ DepWaitTaskArgs[3] = DependenciesArray.emitRawPointer(CGF);
+ DepWaitTaskArgs[4] = CGF.Builder.getInt32(0);
+ DepWaitTaskArgs[5] = llvm::ConstantPointerNull::get(CGF.VoidPtrTy);
+ DepWaitTaskArgs[6] =
+ llvm::ConstantInt::get(CGF.Int32Ty, Data.HasNowaitClause);
+
+ CodeGenFunction::RunCleanupsScope LocalScope(CGF);
+
+ // Build void __kmpc_omp_taskwait_deps_51(ident_t *, kmp_int32 gtid,
+ // kmp_int32 ndeps, kmp_depend_info_t *dep_list, kmp_int32
+ // ndeps_noalias, kmp_depend_info_t *noalias_dep_list,
+ // kmp_int32 has_no_wait); if dependence info is specified.
+ CGF.EmitRuntimeCall(OMPBuilder.getOrCreateRuntimeFunction(
+ M, OMPRTL___kmpc_omp_taskwait_deps_51),
+ DepWaitTaskArgs);
+ } else {
+ // Build call kmp_int32 __kmpc_omp_taskwait(ident_t *loc, kmp_int32
+ // global_tid);
+ llvm::Value *Args[] = {UpLoc, ThreadID};
+ // Ignore return result until untied tasks are supported.
+ CGF.EmitRuntimeCall(
+ OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_omp_taskwait),
+ Args);
+ }
+ };
+ if (CGF.getOMPWithinTaskgraph()) {
+ // Lexically within taskgraph, always replayable.
+ RegionCodeGenTy TaskgraphRCG(TaskgraphTaskwaitCodeGen);
+ TaskgraphRCG(CGF);
} else {
-
- // Build call kmp_int32 __kmpc_omp_taskwait(ident_t *loc, kmp_int32
- // global_tid);
- llvm::Value *Args[] = {UpLoc, ThreadID};
- // Ignore return result until untied tasks are supported.
- CGF.EmitRuntimeCall(
- OMPBuilder.getOrCreateRuntimeFunction(M, OMPRTL___kmpc_omp_taskwait),
- Args);
+ if (ReplayableCond) {
+ // We have a replayable clause. Taskwait is replayable if its argument
+ // is omitted or evaluates to TRUE.
+ emitIfClause(CGF, ReplayableCond, TaskgraphTaskwaitCodeGen,
+ NonTaskgraphTaskwaitCodeGen);
+ } else {
+ // Not taskgraph, not replayable.
+ RegionCodeGenTy NonTaskgraphRCG(NonTaskgraphTaskwaitCodeGen);
+ NonTaskgraphRCG(CGF);
+ }
}
}
@@ -13362,6 +13586,7 @@ void CGOpenMPSIMDRuntime::emitTaskCall(CodeGenFunction &CGF, SourceLocation Loc,
llvm::Function *TaskFunction,
QualType SharedsTy, Address Shareds,
const Expr *IfCond,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data) {
llvm_unreachable("Not supported in SIMD-only mode");
}
@@ -13369,7 +13594,7 @@ void CGOpenMPSIMDRuntime::emitTaskCall(CodeGenFunction &CGF, SourceLocation Loc,
void CGOpenMPSIMDRuntime::emitTaskLoopCall(
CodeGenFunction &CGF, SourceLocation Loc, const OMPLoopDirective &D,
llvm::Function *TaskFunction, QualType SharedsTy, Address Shareds,
- const Expr *IfCond, const OMPTaskDataTy &Data) {
+ const Expr *IfCond, const Expr *ReplayableCond, const OMPTaskDataTy &Data) {
llvm_unreachable("Not supported in SIMD-only mode");
}
@@ -13410,6 +13635,7 @@ Address CGOpenMPSIMDRuntime::getTaskReductionItem(CodeGenFunction &CGF,
void CGOpenMPSIMDRuntime::emitTaskwaitCall(CodeGenFunction &CGF,
SourceLocation Loc,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data) {
llvm_unreachable("Not supported in SIMD-only mode");
}
diff --git a/clang/lib/CodeGen/CGOpenMPRuntime.h b/clang/lib/CodeGen/CGOpenMPRuntime.h
index b74823dd6b7c1..7ac06547a5409 100644
--- a/clang/lib/CodeGen/CGOpenMPRuntime.h
+++ b/clang/lib/CodeGen/CGOpenMPRuntime.h
@@ -123,6 +123,7 @@ struct OMPTaskDataTy final {
bool IsWorksharingReduction = false;
bool HasNowaitClause = false;
bool HasModifier = false;
+ const Expr *ReplayableCond = nullptr;
};
/// Class intended to support codegen of all kind of the reduction clauses.
@@ -582,7 +583,9 @@ class CGOpenMPRuntime {
TaskResultTy emitTaskInit(CodeGenFunction &CGF, SourceLocation Loc,
const OMPExecutableDirective &D,
llvm::Function *TaskFunction, QualType SharedsTy,
- Address Shareds, const OMPTaskDataTy &Data);
+ Address Shareds, const OMPTaskDataTy &Data,
+ bool ForTaskgraph,
+ std::array<llvm::Value*, 3> &TaskAllocArgs);
/// Emit update for lastprivate conditional data.
void emitLastprivateConditionalUpdate(CodeGenFunction &CGF, LValue IVLVal,
@@ -1175,6 +1178,7 @@ class CGOpenMPRuntime {
const OMPExecutableDirective &D,
llvm::Function *TaskFunction, QualType SharedsTy,
Address Shareds, const Expr *IfCond,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data);
/// Emit task region for the taskloop directive. The taskloop region is
@@ -1210,7 +1214,8 @@ class CGOpenMPRuntime {
const OMPLoopDirective &D,
llvm::Function *TaskFunction,
QualType SharedsTy, Address Shareds,
- const Expr *IfCond, const OMPTaskDataTy &Data);
+ const Expr *IfCond, const Expr *ReplayableCond,
+ const OMPTaskDataTy &Data);
/// Emit code for the directive that does not require outlining.
///
@@ -1378,6 +1383,7 @@ class CGOpenMPRuntime {
/// Emit code for 'taskwait' directive.
virtual void emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data);
/// Emit code for 'taskgraph' directive.
@@ -2056,6 +2062,7 @@ class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {
const OMPExecutableDirective &D,
llvm::Function *TaskFunction, QualType SharedsTy,
Address Shareds, const Expr *IfCond,
+ const Expr *ReplayableCond,
const OMPTaskDataTy &Data) override;
/// Emit task region for the taskloop directive. The taskloop region is
@@ -2090,7 +2097,8 @@ class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {
void emitTaskLoopCall(CodeGenFunction &CGF, SourceLocation Loc,
const OMPLoopDirective &D, llvm::Function *TaskFunction,
QualType SharedsTy, Address Shareds, const Expr *IfCond,
- const OMPTaskDataTy &Data) override;
+ const Expr *ReplayableCond, const OMPTaskDataTy &Data)
+ override;
/// Emit a code for reduction clause. Next code should be emitted for
/// reduction:
@@ -2210,7 +2218,8 @@ class CGOpenMPSIMDRuntime final : public CGOpenMPRuntime {
/// Emit code for 'taskwait' directive.
void emitTaskwaitCall(CodeGenFunction &CGF, SourceLocation Loc,
- const OMPTaskDataTy &Data) override;
+ const Expr *ReplayableCond, const OMPTaskDataTy &Data)
+ override;
/// Emit code for 'taskgraph' directive.
/// \param IfCond Expression evaluated in if clause associated with the target
diff --git a/clang/lib/CodeGen/CGStmtOpenMP.cpp b/clang/lib/CodeGen/CGStmtOpenMP.cpp
index 724f093279ac6..a88016edeb967 100644
--- a/clang/lib/CodeGen/CGStmtOpenMP.cpp
+++ b/clang/lib/CodeGen/CGStmtOpenMP.cpp
@@ -5483,8 +5483,12 @@ void CodeGenFunction::EmitOMPTargetTaskBasedDirective(
IntegerLiteral IfCond(getContext(), TrueOrFalse,
getContext().getIntTypeForBitwidth(32, /*Signed=*/0),
SourceLocation());
+ IntegerLiteral ReplayableCond(getContext(), llvm::APInt(32, 1),
+ getContext().getIntTypeForBitwidth(32, /*Signed=*/0),
+ SourceLocation());
CGM.getOpenMPRuntime().emitTaskCall(*this, S.getBeginLoc(), S, OutlinedFn,
- SharedsTy, CapturedStruct, &IfCond, Data);
+ SharedsTy, CapturedStruct, &IfCond,
+ &ReplayableCond, Data);
}
void CodeGenFunction::processInReduction(const OMPExecutableDirective &S,
@@ -5593,15 +5597,27 @@ void CodeGenFunction::EmitOMPTaskDirective(const OMPTaskDirective &S) {
OMPTaskDataTy Data;
// Check if we should emit tied or untied task.
Data.Tied = !S.getSingleClause<OMPUntiedClause>();
+ const Expr *ReplayableCond = nullptr;
+ if (auto *RC = S.getSingleClause<OMPReplayableClause>()) {
+ ReplayableCond = RC->getCondition();
+ if (!ReplayableCond) {
+ ReplayableCond =
+ IntegerLiteral::Create(
+ getContext(), llvm::APInt(32, 1),
+ getContext().getIntTypeForBitwidth(32, /*Signed=*/0),
+ SourceLocation());
+ }
+ }
auto &&BodyGen = [CS](CodeGenFunction &CGF, PrePostActionTy &) {
CGF.EmitStmt(CS->getCapturedStmt());
};
auto &&TaskGen = [&S, SharedsTy, CapturedStruct,
- IfCond](CodeGenFunction &CGF, llvm::Function *OutlinedFn,
+ IfCond, ReplayableCond](CodeGenFunction &CGF,
+ llvm::Function *OutlinedFn,
const OMPTaskDataTy &Data) {
CGF.CGM.getOpenMPRuntime().emitTaskCall(CGF, S.getBeginLoc(), S, OutlinedFn,
SharedsTy, CapturedStruct, IfCond,
- Data);
+ ReplayableCond, Data);
};
auto LPCRegion =
CGOpenMPRuntime::LastprivateConditionalRAII::disable(*this, S);
@@ -5632,7 +5648,19 @@ void CodeGenFunction::EmitOMPTaskwaitDirective(const OMPTaskwaitDirective &S) {
// Build list of dependences
buildDependences(S, Data);
Data.HasNowaitClause = S.hasClausesOfKind<OMPNowaitClause>();
- CGM.getOpenMPRuntime().emitTaskwaitCall(*this, S.getBeginLoc(), Data);
+ const Expr *ReplayableCond = nullptr;
+ if (auto *RC = S.getSingleClause<OMPReplayableClause>()) {
+ ReplayableCond = RC->getCondition();
+ if (!ReplayableCond) {
+ ReplayableCond =
+ IntegerLiteral::Create(
+ getContext(), llvm::APInt(32, 1),
+ getContext().getIntTypeForBitwidth(32, /*Signed=*/0),
+ SourceLocation());
+ }
+ }
+ CGM.getOpenMPRuntime().emitTaskwaitCall(*this, S.getBeginLoc(),
+ ReplayableCond, Data);
}
void CodeGenFunction::EmitOMPTaskgraphDirective(
@@ -7987,6 +8015,18 @@ void CodeGenFunction::EmitOMPTaskLoopBasedDirective(const OMPLoopDirective &S) {
}
}
+ const Expr *ReplayableCond = nullptr;
+ if (auto *RC = S.getSingleClause<OMPReplayableClause>()) {
+ ReplayableCond = RC->getCondition();
+ if (!ReplayableCond) {
+ ReplayableCond =
+ IntegerLiteral::Create(
+ getContext(), llvm::APInt(32, 1),
+ getContext().getIntTypeForBitwidth(32, /*Signed=*/0),
+ SourceLocation());
+ }
+ }
+
OMPTaskDataTy Data;
// Check if taskloop must be emitted without taskgroup.
Data.Nogroup = S.getSingleClause<OMPNogroupClause>();
@@ -8106,15 +8146,18 @@ void CodeGenFunction::EmitOMPTaskLoopBasedDirective(const OMPLoopDirective &S) {
(*LIP)->getType(), S.getBeginLoc()));
});
};
- auto &&TaskGen = [&S, SharedsTy, CapturedStruct,
- IfCond](CodeGenFunction &CGF, llvm::Function *OutlinedFn,
- const OMPTaskDataTy &Data) {
- auto &&CodeGen = [&S, OutlinedFn, SharedsTy, CapturedStruct, IfCond,
- &Data](CodeGenFunction &CGF, PrePostActionTy &) {
+ auto &&TaskGen =
+ [&S, SharedsTy, CapturedStruct, IfCond, ReplayableCond]
+ (CodeGenFunction &CGF, llvm::Function *OutlinedFn,
+ const OMPTaskDataTy &Data) {
+ auto &&CodeGen =
+ [&S, OutlinedFn, SharedsTy, CapturedStruct, IfCond, ReplayableCond,
+ &Data](CodeGenFunction &CGF, PrePostActionTy &) {
OMPLoopScope PreInitScope(CGF, S);
CGF.CGM.getOpenMPRuntime().emitTaskLoopCall(CGF, S.getBeginLoc(), S,
OutlinedFn, SharedsTy,
- CapturedStruct, IfCond, Data);
+ CapturedStruct, IfCond,
+ ReplayableCond, Data);
};
CGF.CGM.getOpenMPRuntime().emitInlinedDirective(CGF, OMPD_taskloop,
CodeGen);
diff --git a/clang/lib/CodeGen/CodeGenFunction.h b/clang/lib/CodeGen/CodeGenFunction.h
index 2b2d08570ee38..9ddada466dba8 100644
--- a/clang/lib/CodeGen/CodeGenFunction.h
+++ b/clang/lib/CodeGen/CodeGenFunction.h
@@ -812,6 +812,22 @@ class CodeGenFunction : public CodeGenTypeCache {
}
};
+ bool OMPWithinTaskgraph = false;
+
+ bool getOMPWithinTaskgraph() { return OMPWithinTaskgraph; }
+ void setOMPWithinTaskgraph(bool In) { OMPWithinTaskgraph = In; }
+
+ class OMPWithinTaskgraphRAII {
+ CodeGenFunction &CGF;
+ public:
+ OMPWithinTaskgraphRAII(CodeGenFunction &CGF_) : CGF(CGF_) {
+ CGF.setOMPWithinTaskgraph(true);
+ }
+ ~OMPWithinTaskgraphRAII() {
+ CGF.setOMPWithinTaskgraph(false);
+ }
+ };
+
template <class T>
typename DominatingValue<T>::saved_type saveValueInCond(T value) {
return DominatingValue<T>::save(*this, value);
diff --git a/clang/lib/Parse/ParseOpenMP.cpp b/clang/lib/Parse/ParseOpenMP.cpp
index 979d376d438fc..da1555f02eb1c 100644
--- a/clang/lib/Parse/ParseOpenMP.cpp
+++ b/clang/lib/Parse/ParseOpenMP.cpp
@@ -3328,6 +3328,7 @@ OMPClause *Parser::ParseOpenMPClause(OpenMPDirectiveKind DKind,
case OMPC_reverse_offload:
case OMPC_dynamic_allocators:
case OMPC_full:
+ case OMPC_replayable:
// OpenMP [2.7.1, Restrictions, p. 9]
// Only one ordered clause can appear on a loop directive.
// OpenMP [2.7.1, Restrictions, C/C++, p. 4]
@@ -3341,7 +3342,8 @@ OMPClause *Parser::ParseOpenMPClause(OpenMPDirectiveKind DKind,
ErrorFound = true;
}
- if (CKind == OMPC_nowait && PP.LookAhead(/*N=*/0).is(tok::l_paren) &&
+ if ((CKind == OMPC_nowait || CKind == OMPC_replayable) &&
+ PP.LookAhead(/*N=*/0).is(tok::l_paren) &&
getLangOpts().OpenMP >= 60)
Clause = ParseOpenMPSingleExprClause(CKind, WrongDirective);
else
@@ -3362,6 +3364,17 @@ OMPClause *Parser::ParseOpenMPClause(OpenMPDirectiveKind DKind,
}
Clause = ParseOpenMPClause(CKind, WrongDirective);
break;
+ if (getLangOpts().OpenMP < 60) {
+ // FIXME: This isn't an appropriate error message.
+ Diag(Tok, diag::err_omp_expected_clause)
+ << getOpenMPDirectiveName(OMPD_requires, OMPVersion);
+ ErrorFound = true;
+ }
+ if (PP.LookAhead(/*N=*/0).is(tok::l_paren))
+ Clause = ParseOpenMPSingleExprClause(CKind, WrongDirective);
+ else
+ Clause = ParseOpenMPClause(CKind, WrongDirective);
+ break;
case OMPC_update:
if (!FirstClause) {
Diag(Tok, diag::err_omp_more_one_clause)
diff --git a/clang/lib/Sema/SemaOpenMP.cpp b/clang/lib/Sema/SemaOpenMP.cpp
index 899fc19fbd4bb..fff9445c168d0 100644
--- a/clang/lib/Sema/SemaOpenMP.cpp
+++ b/clang/lib/Sema/SemaOpenMP.cpp
@@ -16623,6 +16623,9 @@ OMPClause *SemaOpenMP::ActOnOpenMPSingleExprClause(OpenMPClauseKind Kind,
case OMPC_graph_reset:
Res = ActOnOpenMPGraphResetClause(Expr, StartLoc, LParenLoc, EndLoc);
break;
+ case OMPC_replayable:
+ Res = ActOnOpenMPReplayableClause(StartLoc, EndLoc, LParenLoc, Expr);
+ break;
case OMPC_novariants:
Res = ActOnOpenMPNovariantsClause(Expr, StartLoc, LParenLoc, EndLoc);
break;
@@ -18326,6 +18329,11 @@ OMPClause *SemaOpenMP::ActOnOpenMPClause(OpenMPClauseKind Kind,
case OMPC_self_maps:
Res = ActOnOpenMPSelfMapsClause(StartLoc, EndLoc);
break;
+ case OMPC_replayable:
+ Res = ActOnOpenMPReplayableClause(StartLoc, EndLoc,
+ /*LParenLoc=*/SourceLocation(),
+ /*Condition=*/nullptr);
+ break;
case OMPC_destroy:
Res = ActOnOpenMPDestroyClause(/*InteropVar=*/nullptr, StartLoc,
/*LParenLoc=*/SourceLocation(),
@@ -18560,6 +18568,26 @@ OMPClause *SemaOpenMP::ActOnOpenMPSelfMapsClause(SourceLocation StartLoc,
return new (getASTContext()) OMPSelfMapsClause(StartLoc, EndLoc);
}
+OMPClause *SemaOpenMP::ActOnOpenMPReplayableClause(SourceLocation StartLoc,
+ SourceLocation EndLoc,
+ SourceLocation LParenLoc,
+ Expr *Condition) {
+ Expr *ValExpr = Condition;
+ if (Condition && LParenLoc.isValid()) {
+ if (!Condition->isValueDependent() && !Condition->isTypeDependent() &&
+ !Condition->isInstantiationDependent() &&
+ !Condition->containsUnexpandedParameterPack()) {
+ ExprResult Val = SemaRef.CheckBooleanCondition(StartLoc, Condition);
+ if (Val.isInvalid())
+ return nullptr;
+
+ ValExpr = Val.get();
+ }
+ }
+ return new (getASTContext())
+ OMPReplayableClause(ValExpr, StartLoc, LParenLoc, EndLoc);
+}
+
StmtResult
SemaOpenMP::ActOnOpenMPInteropDirective(ArrayRef<OMPClause *> Clauses,
SourceLocation StartLoc,
diff --git a/clang/lib/Sema/TreeTransform.h b/clang/lib/Sema/TreeTransform.h
index 766b08929e7fa..19dc278e14cc2 100644
--- a/clang/lib/Sema/TreeTransform.h
+++ b/clang/lib/Sema/TreeTransform.h
@@ -1892,6 +1892,18 @@ class TreeTransform {
LParenLoc, Condition);
}
+ /// Build a new OpenMP 'replayable' clause.
+ ///
+ /// By default, performs semantic analysis to build the new OpenMP clause.
+ /// Subclasses may override this routine to provide different behavior.
+ OMPClause *RebuildOMPReplayableClause(Expr *Condition,
+ SourceLocation StartLoc,
+ SourceLocation LParenLoc,
+ SourceLocation EndLoc) {
+ return getSema().OpenMP().ActOnOpenMPReplayableClause(StartLoc, EndLoc,
+ LParenLoc, Condition);
+ }
+
/// Build a new OpenMP 'private' clause.
///
/// By default, performs semantic analysis to build the new OpenMP clause.
@@ -10822,6 +10834,20 @@ TreeTransform<Derived>::TransformOMPNowaitClause(OMPNowaitClause *C) {
C->getLParenLoc(), C->getEndLoc());
}
+template <typename Derived>
+OMPClause *
+TreeTransform<Derived>::TransformOMPReplayableClause(OMPReplayableClause *C) {
+ ExprResult Cond;
+ if (auto *Condition = C->getCondition()) {
+ Cond = getDerived().TransformExpr(Condition);
+ if (Cond.isInvalid())
+ return nullptr;
+ }
+ return getDerived().RebuildOMPReplayableClause(Cond.get(), C->getBeginLoc(),
+ C->getLParenLoc(),
+ C->getEndLoc());
+}
+
template <typename Derived>
OMPClause *
TreeTransform<Derived>::TransformOMPUntiedClause(OMPUntiedClause *C) {
diff --git a/clang/lib/Serialization/ASTReader.cpp b/clang/lib/Serialization/ASTReader.cpp
index 79ab15a09cde7..a0fcc2189bf40 100644
--- a/clang/lib/Serialization/ASTReader.cpp
+++ b/clang/lib/Serialization/ASTReader.cpp
@@ -11609,6 +11609,12 @@ OMPClause *OMPClauseReader::readClause() {
case llvm::omp::OMPC_graph_id:
C = new (Context) OMPGraphIdClause();
break;
+ case llvm::omp::OMPC_graph_reset:
+ C = new (Context) OMPGraphResetClause();
+ break;
+ case llvm::omp::OMPC_replayable:
+ C = new (Context) OMPReplayableClause();
+ break;
case llvm::omp::OMPC_num_tasks:
C = new (Context) OMPNumTasksClause();
break;
@@ -11903,6 +11909,11 @@ void OMPClauseReader::VisitOMPNowaitClause(OMPNowaitClause *C) {
C->setLParenLoc(Record.readSourceLocation());
}
+void OMPClauseReader::VisitOMPReplayableClause(OMPReplayableClause *C) {
+ C->setCondition(Record.readSubExpr());
+ C->setLParenLoc(Record.readSourceLocation());
+}
+
void OMPClauseReader::VisitOMPUntiedClause(OMPUntiedClause *) {}
void OMPClauseReader::VisitOMPMergeableClause(OMPMergeableClause *) {}
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index 9c2aa6632c123..d66add25f8b62 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -8093,6 +8093,11 @@ void OMPClauseWriter::VisitOMPNowaitClause(OMPNowaitClause *C) {
Record.AddSourceLocation(C->getLParenLoc());
}
+void OMPClauseWriter::VisitOMPReplayableClause(OMPReplayableClause *C) {
+ Record.AddStmt(C->getCondition());
+ Record.AddSourceLocation(C->getLParenLoc());
+}
+
void OMPClauseWriter::VisitOMPUntiedClause(OMPUntiedClause *) {}
void OMPClauseWriter::VisitOMPMergeableClause(OMPMergeableClause *) {}
diff --git a/clang/tools/libclang/CIndex.cpp b/clang/tools/libclang/CIndex.cpp
index 3af9d481f4b91..cee15395404d1 100644
--- a/clang/tools/libclang/CIndex.cpp
+++ b/clang/tools/libclang/CIndex.cpp
@@ -2408,6 +2408,10 @@ void OMPClauseEnqueue::VisitOMPNowaitClause(const OMPNowaitClause *C) {
Visitor->AddStmt(C->getCondition());
}
+void OMPClauseEnqueue::VisitOMPReplayableClause(const OMPReplayableClause *C) {
+ Visitor->AddStmt(C->getCondition());
+}
+
void OMPClauseEnqueue::VisitOMPUntiedClause(const OMPUntiedClause *) {}
void OMPClauseEnqueue::VisitOMPMergeableClause(const OMPMergeableClause *) {}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMP.td b/llvm/include/llvm/Frontend/OpenMP/OMP.td
index 09a899cbf2562..00f5753159f69 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMP.td
+++ b/llvm/include/llvm/Frontend/OpenMP/OMP.td
@@ -499,6 +499,7 @@ def OMPC_Release : Clause<[Spelling<"release">]> {
let clangClass = "OMPReleaseClause";
}
def OMPC_Replayable : Clause<[Spelling<"replayable">]> {
+ let clangClass = "OMPReplayableClause";
let flangClass = "OmpReplayableClause";
let isValueOptional = true;
}
diff --git a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
index 288585c8b42a6..dfc00289a4098 100644
--- a/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
+++ b/llvm/include/llvm/Frontend/OpenMP/OMPKinds.def
@@ -357,7 +357,17 @@ __OMP_RTL(__kmpc_omp_task, false, Int32, IdentPtr, Int32,
/* kmp_task_t */ VoidPtr)
__OMP_RTL(__kmpc_end_taskgroup, false, Void, IdentPtr, Int32)
__OMP_RTL(__kmpc_taskgroup, false, Void, IdentPtr, Int32)
-__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, Int32, Int32, Int32, VoidPtr, VoidPtr)
+__OMP_RTL(__kmpc_taskgraph, false, Void, IdentPtr, Int32, VoidPtrPtr, Int32,
+ Int32, Int32, VoidPtr, VoidPtr)
+__OMP_RTL(__kmpc_taskgraph_task, false, Int32, IdentPtr, Int32, VoidPtr, Int32,
+ SizeTy, VoidPtr, SizeTy, Int32, VoidPtr)
+__OMP_RTL(__kmpc_taskgraph_taskloop, false, Int32, IdentPtr, Int32, VoidPtr,
+ Int32, SizeTy, VoidPtr, SizeTy, Int32, Int64Ptr, Int64Ptr, Int64,
+ Int32, Int32, Int64, Int32, VoidPtr)
+__OMP_RTL(__kmpc_taskgraph_taskwait, false, Void, IdentPtr, Int32, Int32,
+ VoidPtr, Int32)
+__OMP_RTL(__kmpc_taskgraph_taskred_init, false, /* kmp_taskgroup */ VoidPtr,
+ Int32, Int32, VoidPtr)
__OMP_RTL(__kmpc_omp_task_begin_if0, false, Void, IdentPtr, Int32,
/* kmp_task_t */ VoidPtr)
__OMP_RTL(__kmpc_omp_task_complete_if0, false, Void, IdentPtr, Int32,
>From eec9d38f222263cdd201f4be01a8fd619459f934 Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Wed, 25 Mar 2026 16:57:13 -0500
Subject: [PATCH 27/28] [OpenMP] OpenMP 6.0 "taskgraph" support, remove
obsolete tests
---
.../test/tasking/omp_record_replay.cpp | 48 ------------
.../test/tasking/omp_record_replay_deps.cpp | 63 ---------------
.../omp_record_replay_deps_multi_succ.cpp | 56 --------------
.../tasking/omp_record_replay_multiTDGs.cpp | 76 -------------------
.../tasking/omp_record_replay_print_dot.cpp | 69 -----------------
.../tasking/omp_record_replay_taskloop.cpp | 50 ------------
.../test/tasking/omp_taskgraph_print_dot.cpp | 58 --------------
7 files changed, 420 deletions(-)
delete mode 100644 openmp/runtime/test/tasking/omp_record_replay.cpp
delete mode 100644 openmp/runtime/test/tasking/omp_record_replay_deps.cpp
delete mode 100644 openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
delete mode 100644 openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
delete mode 100644 openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
delete mode 100644 openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
delete mode 100644 openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
diff --git a/openmp/runtime/test/tasking/omp_record_replay.cpp b/openmp/runtime/test/tasking/omp_record_replay.cpp
deleted file mode 100644
index 4fea22e081da9..0000000000000
--- a/openmp/runtime/test/tasking/omp_record_replay.cpp
+++ /dev/null
@@ -1,48 +0,0 @@
-// REQUIRES: omp_taskgraph_experimental
-// RUN: %libomp-cxx-compile-and-run
-#include <iostream>
-#include <cassert>
-#define NT 100
-
-// Compiler-generated code (emulation)
-typedef struct ident {
- void* dummy;
-} ident_t;
-
-
-#ifdef __cplusplus
-extern "C" {
- int __kmpc_global_thread_num(ident_t *);
- int __kmpc_start_record_task(ident_t *, int, int, int);
- void __kmpc_end_record_task(ident_t *, int, int , int);
-}
-#endif
-
-void func(int *num_exec) {
- (*num_exec)++;
-}
-
-int main() {
- int num_exec = 0;
- int num_tasks = 0;
- int x=0;
- #pragma omp parallel
- #pragma omp single
- for (int iter = 0; iter < NT; ++iter) {
- int gtid = __kmpc_global_thread_num(nullptr);
- int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0, /* tdg_id */0);
- if (res) {
- num_tasks++;
- #pragma omp task
- func(&num_exec);
- }
- __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */0, /* tdg_id */0);
- }
-
- assert(num_tasks==1);
- assert(num_exec==NT);
-
- std::cout << "Passed" << std::endl;
- return 0;
-}
-// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_record_replay_deps.cpp b/openmp/runtime/test/tasking/omp_record_replay_deps.cpp
deleted file mode 100644
index 4c06ae3f7b273..0000000000000
--- a/openmp/runtime/test/tasking/omp_record_replay_deps.cpp
+++ /dev/null
@@ -1,63 +0,0 @@
-// REQUIRES: omp_taskgraph_experimental
-// RUN: %libomp-cxx-compile-and-run
-#include <iostream>
-#include <cassert>
-#define NT 100
-#define MULTIPLIER 100
-#define DECREMENT 5
-
-int val;
-// Compiler-generated code (emulation)
-typedef struct ident {
- void* dummy;
-} ident_t;
-
-
-#ifdef __cplusplus
-extern "C" {
- int __kmpc_global_thread_num(ident_t *);
- int __kmpc_start_record_task(ident_t *, int, int, int);
- void __kmpc_end_record_task(ident_t *, int, int, int);
-}
-#endif
-
-void sub() {
- #pragma omp atomic
- val -= DECREMENT;
-}
-
-void add() {
- #pragma omp atomic
- val += DECREMENT;
-}
-
-void mult() {
- // no atomicity needed, can only be executed by 1 thread
- // and no concurrency with other tasks possible
- val *= MULTIPLIER;
-}
-
-int main() {
- val = 0;
- int *x, *y;
- #pragma omp parallel
- #pragma omp single
- for (int iter = 0; iter < NT; ++iter) {
- int gtid = __kmpc_global_thread_num(nullptr);
- int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */0, /* tdg_id */0);
- if (res) {
- #pragma omp task depend(out:y)
- add();
- #pragma omp task depend(out:x)
- sub();
- #pragma omp task depend(in:x,y)
- mult();
- }
- __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */0, /* tdg_id */0);
- }
- assert(val==0);
-
- std::cout << "Passed" << std::endl;
- return 0;
-}
-// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp b/openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
deleted file mode 100644
index 6bcd3dee56030..0000000000000
--- a/openmp/runtime/test/tasking/omp_record_replay_deps_multi_succ.cpp
+++ /dev/null
@@ -1,56 +0,0 @@
-// REQUIRES: omp_taskgraph_experimental
-// RUN: %libomp-cxx-compile-and-run
-#include <omp.h>
-#include <cassert>
-#include <vector>
-
-constexpr const int TASKS_SIZE = 12;
-
-typedef struct ident ident_t;
-
-extern "C" {
-int __kmpc_global_thread_num(ident_t *);
-int __kmpc_start_record_task(ident_t *, int, int, int);
-void __kmpc_end_record_task(ident_t *, int, int, int);
-}
-
-void init(int &A, int val) { A = val; }
-
-void update(int &A, int &B, int val) { A = B + val; }
-
-void test(int nb, std::vector<std::vector<int>> &Ah) {
-#pragma omp parallel
-#pragma omp single
- {
- int gtid = __kmpc_global_thread_num(nullptr);
- int res = __kmpc_start_record_task(nullptr, gtid, 0, 0);
- if (res) {
- for (int k = 0; k < nb; ++k) {
-#pragma omp task depend(inout : Ah[k][0])
- init(Ah[k][0], k);
-
- for (int i = 1; i < nb; ++i) {
-#pragma omp task depend(in : Ah[k][0]) depend(out : Ah[k][i])
- update(Ah[k][i], Ah[k][0], 1);
- }
- }
- }
- __kmpc_end_record_task(nullptr, gtid, 0, 0);
- }
-}
-
-int main() {
- std::vector<std::vector<int>> matrix(TASKS_SIZE,
- std::vector<int>(TASKS_SIZE, 0));
-
- test(TASKS_SIZE, matrix);
- test(TASKS_SIZE, matrix);
-
- for (int k = 0; k < TASKS_SIZE; ++k) {
- assert(matrix[k][0] == k);
- for (int i = 1; i < TASKS_SIZE; ++i) {
- assert(matrix[k][i] == k + 1);
- }
- }
- return 0;
-}
diff --git a/openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp b/openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
deleted file mode 100644
index 1864d5d89cc70..0000000000000
--- a/openmp/runtime/test/tasking/omp_record_replay_multiTDGs.cpp
+++ /dev/null
@@ -1,76 +0,0 @@
-// REQUIRES: omp_taskgraph_experimental
-// RUN: %libomp-cxx-compile-and-run
-#include <iostream>
-#include <cassert>
-#define NT 20
-#define MULTIPLIER 100
-#define DECREMENT 5
-
-// Compiler-generated code (emulation)
-typedef struct ident {
- void* dummy;
-} ident_t;
-
-int val;
-#ifdef __cplusplus
-extern "C" {
- int __kmpc_global_thread_num(ident_t *);
- int __kmpc_start_record_task(ident_t *, int, int, int);
- void __kmpc_end_record_task(ident_t *, int, int , int);
-}
-#endif
-
-void sub() {
- #pragma omp atomic
- val -= DECREMENT;
-}
-
-void add() {
- #pragma omp atomic
- val += DECREMENT;
-}
-
-void mult() {
- // no atomicity needed, can only be executed by 1 thread
- // and no concurrency with other tasks possible
- val *= MULTIPLIER;
-}
-
-int main() {
- int num_tasks = 0;
- int *x, *y;
- #pragma omp parallel
- #pragma omp single
- for (int iter = 0; iter < NT; ++iter) {
- int gtid = __kmpc_global_thread_num(nullptr);
- int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0, /* tdg_id */0);
- if (res) {
- num_tasks++;
- #pragma omp task depend(out:y)
- add();
- #pragma omp task depend(out:x)
- sub();
- #pragma omp task depend(in:x,y)
- mult();
- }
- __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */0, /* tdg_id */0);
- res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0, /* tdg_id */1);
- if (res) {
- num_tasks++;
- #pragma omp task depend(out:y)
- add();
- #pragma omp task depend(out:x)
- sub();
- #pragma omp task depend(in:x,y)
- mult();
- }
- __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */0, /* tdg_id */1);
- }
-
- assert(num_tasks==2);
- assert(val==0);
-
- std::cout << "Passed" << std::endl;
- return 0;
-}
-// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp b/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
deleted file mode 100644
index e3d2c017c21c7..0000000000000
--- a/openmp/runtime/test/tasking/omp_record_replay_print_dot.cpp
+++ /dev/null
@@ -1,69 +0,0 @@
-// REQUIRES: omp_taskgraph_experimental
-// RUN: %libomp-cxx-compile-and-run
-// RUN: cat tdg_0.dot | FileCheck %s
-// RUN: rm -f tdg_0.dot
-
-#include <cstdlib>
-#include <cassert>
-
-// Compiler-generated code (emulation)
-typedef struct ident {
- void* dummy;
-} ident_t;
-
-#ifdef __cplusplus
-extern "C" {
- int __kmpc_global_thread_num(ident_t *);
- int __kmpc_start_record_task(ident_t *, int, int, int);
- void __kmpc_end_record_task(ident_t *, int, int , int);
-}
-#endif
-
-void func(int *num_exec) {
- #pragma omp atomic
- (*num_exec)++;
-}
-
-int main() {
- int num_exec = 0;
- int x, y;
-
- setenv("KMP_TDG_DOT", "TRUE", 1);
-
-#pragma omp parallel
-#pragma omp single
- {
- int gtid = __kmpc_global_thread_num(nullptr);
- int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0, /* tdg_id */ 0);
- if (res) {
- #pragma omp task depend(out : x)
- func(&num_exec);
- #pragma omp task depend(in : x) depend(out : y)
- func(&num_exec);
- #pragma omp task depend(in : y)
- func(&num_exec);
- #pragma omp task depend(in : y)
- func(&num_exec);
- }
-
- __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */ 0, /* tdg_id */ 0);
- }
-
- assert(num_exec == 4);
-
- return 0;
-}
-
-// CHECK: digraph TDG {
-// CHECK-NEXT: compound=true
-// CHECK-NEXT: subgraph cluster {
-// CHECK-NEXT: label=TDG_0
-// CHECK-NEXT: 0[style=bold]
-// CHECK-NEXT: 1[style=bold]
-// CHECK-NEXT: 2[style=bold]
-// CHECK-NEXT: 3[style=bold]
-// CHECK-NEXT: }
-// CHECK-NEXT: 0 -> 1
-// CHECK-NEXT: 1 -> 2
-// CHECK-NEXT: 1 -> 3
-// CHECK-NEXT: }
diff --git a/openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp b/openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
deleted file mode 100644
index 163a1b4192d85..0000000000000
--- a/openmp/runtime/test/tasking/omp_record_replay_taskloop.cpp
+++ /dev/null
@@ -1,50 +0,0 @@
-// REQUIRES: omp_taskgraph_experimental
-// RUN: %libomp-cxx-compile-and-run
-#include <iostream>
-#include <cassert>
-
-#define NT 20
-#define N 128*128
-
-typedef struct ident {
- void* dummy;
-} ident_t;
-
-
-#ifdef __cplusplus
-extern "C" {
- int __kmpc_global_thread_num(ident_t *);
- int __kmpc_start_record_task(ident_t *, int, int, int);
- void __kmpc_end_record_task(ident_t *, int, int , int);
-}
-#endif
-
-int main() {
- int num_tasks = 0;
-
- int array[N];
- for (int i = 0; i < N; ++i)
- array[i] = 1;
-
- long sum = 0;
- #pragma omp parallel
- #pragma omp single
- for (int iter = 0; iter < NT; ++iter) {
- int gtid = __kmpc_global_thread_num(nullptr);
- int res = __kmpc_start_record_task(nullptr, gtid, /* kmp_tdg_flags */0, /* tdg_id */0);
- if (res) {
- num_tasks++;
- #pragma omp taskloop reduction(+:sum) num_tasks(4096)
- for (int i = 0; i < N; ++i) {
- sum += array[i];
- }
- }
- __kmpc_end_record_task(nullptr, gtid, /* kmp_tdg_flags */0, /* tdg_id */0);
- }
- assert(sum==N*NT);
- assert(num_tasks==1);
-
- std::cout << "Passed" << std::endl;
- return 0;
-}
-// CHECK: Passed
diff --git a/openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp b/openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
deleted file mode 100644
index 0dc81df32d93a..0000000000000
--- a/openmp/runtime/test/tasking/omp_taskgraph_print_dot.cpp
+++ /dev/null
@@ -1,58 +0,0 @@
-// REQUIRES: omp_taskgraph_experimental
-// RUN: %libomp-cxx-compile-and-run
-// RUN: cat tdg_17353.dot | FileCheck %s
-// RUN: rm -f tdg_17353.dot
-
-#include <cstdlib>
-#include <cassert>
-
-// Compiler-generated code (emulation)
-typedef struct ident {
- void *dummy;
-} ident_t;
-
-void func(int *num_exec) {
-#pragma omp atomic
- (*num_exec)++;
-}
-
-int main() {
- int num_exec = 0;
- int x, y;
-
- setenv("KMP_TDG_DOT", "TRUE", 1);
-
-#pragma omp parallel
-#pragma omp single
- {
-#pragma omp taskgraph
- {
-#pragma omp task depend(out : x)
- func(&num_exec);
-#pragma omp task depend(in : x) depend(out : y)
- func(&num_exec);
-#pragma omp task depend(in : y)
- func(&num_exec);
-#pragma omp task depend(in : y)
- func(&num_exec);
- }
- }
-
- assert(num_exec == 4);
-
- return 0;
-}
-
-// CHECK: digraph TDG {
-// CHECK-NEXT: compound=true
-// CHECK-NEXT: subgraph cluster {
-// CHECK-NEXT: label=TDG_17353
-// CHECK-NEXT: 0[style=bold]
-// CHECK-NEXT: 1[style=bold]
-// CHECK-NEXT: 2[style=bold]
-// CHECK-NEXT: 3[style=bold]
-// CHECK-NEXT: }
-// CHECK-NEXT: 0 -> 1
-// CHECK-NEXT: 1 -> 2
-// CHECK-NEXT: 1 -> 3
-// CHECK-NEXT: }
>From 14269b45180aac83010e7704300b2d0120d495ff Mon Sep 17 00:00:00 2001
From: Julian Brown <julian.brown at amd.com>
Date: Wed, 25 Mar 2026 16:57:25 -0500
Subject: [PATCH 28/28] [OpenMP] OpenMP 6.0 "taskgraph" support, add new tests
---
.../test/taskgraph/taskgraph_deps_1.cpp | 50 +++++++++
.../test/taskgraph/taskgraph_deps_10.cpp | 47 ++++++++
.../test/taskgraph/taskgraph_deps_11.cpp | 57 ++++++++++
.../test/taskgraph/taskgraph_deps_12.cpp | 52 +++++++++
.../test/taskgraph/taskgraph_deps_13.cpp | 42 ++++++++
.../test/taskgraph/taskgraph_deps_14.cpp | 45 ++++++++
.../test/taskgraph/taskgraph_deps_15.cpp | 72 +++++++++++++
.../test/taskgraph/taskgraph_deps_16.cpp | 52 +++++++++
.../test/taskgraph/taskgraph_deps_17.cpp | 65 ++++++++++++
.../test/taskgraph/taskgraph_deps_18.cpp | 43 ++++++++
.../test/taskgraph/taskgraph_deps_19.cpp | 48 +++++++++
.../test/taskgraph/taskgraph_deps_2.cpp | 55 ++++++++++
.../test/taskgraph/taskgraph_deps_20.cpp | 48 +++++++++
.../test/taskgraph/taskgraph_deps_21.cpp | 49 +++++++++
.../test/taskgraph/taskgraph_deps_22.cpp | 67 ++++++++++++
.../test/taskgraph/taskgraph_deps_23.cpp | 100 ++++++++++++++++++
.../test/taskgraph/taskgraph_deps_24.cpp | 77 ++++++++++++++
.../test/taskgraph/taskgraph_deps_25.cpp | 86 +++++++++++++++
.../test/taskgraph/taskgraph_deps_26.cpp | 58 ++++++++++
.../test/taskgraph/taskgraph_deps_27.cpp | 60 +++++++++++
.../test/taskgraph/taskgraph_deps_3.cpp | 77 ++++++++++++++
.../test/taskgraph/taskgraph_deps_4.cpp | 73 +++++++++++++
.../test/taskgraph/taskgraph_deps_5.cpp | 60 +++++++++++
.../test/taskgraph/taskgraph_deps_6.cpp | 56 ++++++++++
.../test/taskgraph/taskgraph_deps_7.cpp | 56 ++++++++++
.../test/taskgraph/taskgraph_deps_8.cpp | 36 +++++++
.../test/taskgraph/taskgraph_deps_9.cpp | 44 ++++++++
27 files changed, 1575 insertions(+)
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_1.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_10.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_11.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_12.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_13.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_14.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_15.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_16.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_17.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_18.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_19.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_2.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_20.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_21.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_22.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_23.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_24.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_25.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_26.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_27.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_3.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_4.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_5.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_6.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_7.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_8.cpp
create mode 100644 openmp/runtime/test/taskgraph/taskgraph_deps_9.cpp
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_1.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_1.cpp
new file mode 100644
index 0000000000000..d6abdb1e1ee91
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_1.cpp
@@ -0,0 +1,50 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[3];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[2])
+ { }
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(inout: deps[0])
+ { }
+ #pragma omp task depend(inout: deps[1])
+ { }
+ #pragma omp task depend(inout: deps[2])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1], deps[2])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_10.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_10.cpp
new file mode 100644
index 0000000000000..f3dd856f84f93
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_10.cpp
@@ -0,0 +1,47 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[5];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1], deps[4])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1])
+ { }
+
+ #pragma omp task depend(out: deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[3], deps[4])
+ { }
+ #pragma omp task depend(in: deps[2], deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x1c]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x7]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_11.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_11.cpp
new file mode 100644
index 0000000000000..4f2babaefe25a
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_11.cpp
@@ -0,0 +1,57 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1])
+ { }
+
+ #pragma omp task depend(out: deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(in: deps[2], deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_12.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_12.cpp
new file mode 100644
index 0000000000000..d3615187b5462
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_12.cpp
@@ -0,0 +1,52 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[2];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(mutexinoutset: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_13.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_13.cpp
new file mode 100644
index 0000000000000..de2b5e138dd47
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_13.cpp
@@ -0,0 +1,42 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(mutexinoutset: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xc]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x3]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x8]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x4]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x2]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x1]
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_14.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_14.cpp
new file mode 100644
index 0000000000000..684a196a8d8f4
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_14.cpp
@@ -0,0 +1,45 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(mutexinoutset: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1], deps[2])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1], deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1], deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xf]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xe]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x7]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xc]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x3]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x8]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x1]
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_15.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_15.cpp
new file mode 100644
index 0000000000000..d35660ddfd098
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_15.cpp
@@ -0,0 +1,72 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[1], deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[1], deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3], deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3], deps[1], deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3], deps[2])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3], deps[2], deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3], deps[2], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[3], deps[2], deps[1], deps[0])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xf]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xe]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xd]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xb]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x7]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xc]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0xa]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x9]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x6]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x5]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x3]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x8]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x4]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x2]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x1]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_16.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_16.cpp
new file mode 100644
index 0000000000000..45aa3c587d75e
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_16.cpp
@@ -0,0 +1,52 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[8];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(inout: deps[0])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[4], deps[7])
+ { }
+ #pragma omp task depend(inout: deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[4], deps[7])
+ { }
+ #pragma omp task depend(inout: deps[2])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[5], deps[6])
+ { }
+ #pragma omp task depend(inout: deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[5], deps[6])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_17.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_17.cpp
new file mode 100644
index 0000000000000..2c59595f5b00e
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_17.cpp
@@ -0,0 +1,65 @@
+// RUN: %libomp-cxx-compile && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ {
+ fprintf(stderr, "task 0\n");
+ }
+ #pragma omp task depend(out: deps[2], deps[3])
+ {
+ fprintf(stderr, "task 1\n");
+ }
+ #pragma omp task depend(inout: deps[0])
+ {
+ fprintf(stderr, "task 2\n");
+ }
+ #pragma omp task depend(inout: deps[1])
+ {
+ fprintf(stderr, "task 3\n");
+ }
+ #pragma omp task depend(inout: deps[2])
+ {
+ fprintf(stderr, "task 4\n");
+ }
+ #pragma omp task depend(inout: deps[3])
+ {
+ fprintf(stderr, "task 5\n");
+ }
+ #pragma omp task depend(in: deps[0], deps[1], deps[2], deps[3])
+ {
+ fprintf(stderr, "task 6\n");
+ }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK-DAG: task 0
+// CHECK-DAG: task 2
+// CHECK-DAG: task 3
+// CHECK-DAG: task 1
+// CHECK-DAG: task 4
+// CHECK-DAG: task 5
+// CHECK: task 6
+
+// CHECK-DAG: task 0
+// CHECK-DAG: task 2
+// CHECK-DAG: task 3
+// CHECK-DAG: task 1
+// CHECK-DAG: task 4
+// CHECK-DAG: task 5
+// CHECK: task 6
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_18.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_18.cpp
new file mode 100644
index 0000000000000..954cfcbadb7b7
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_18.cpp
@@ -0,0 +1,43 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[2];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp taskloop num_tasks(strict: 2)
+ {
+ for (int j = 0; j < 20; j++) { }
+ }
+ #pragma omp task depend(in: deps[0], deps[1])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: wait: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_19.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_19.cpp
new file mode 100644
index 0000000000000..24d5fdfcb94bc
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_19.cpp
@@ -0,0 +1,48 @@
+// RUN: %libomp-cxx-compile && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+int main()
+{
+ int deps[3];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0])
+ {
+ fprintf(stderr, "task 0\n");
+ }
+ #pragma omp task depend(out: deps[1])
+ {
+ fprintf(stderr, "task 1\n");
+ }
+ #pragma omp task depend(out: deps[2])
+ {
+ fprintf(stderr, "task 2\n");
+ }
+ #pragma omp taskwait depend(in: deps[0], deps[1], deps[2])
+ #pragma omp task depend(in: deps[0], deps[1], deps[2])
+ {
+ fprintf(stderr, "task 3\n");
+ }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK-DAG: task 0
+// CHECK-DAG: task 1
+// CHECK-DAG: task 2
+// CHECK: task 3
+
+// CHECK-DAG: task 0
+// CHECK-DAG: task 1
+// CHECK-DAG: task 2
+// CHECK: task 3
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_2.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_2.cpp
new file mode 100644
index 0000000000000..89dd9137e158b
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_2.cpp
@@ -0,0 +1,55 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(out: deps[2], deps[3])
+ { }
+ #pragma omp task depend(inout: deps[0])
+ { }
+ #pragma omp task depend(inout: deps[1])
+ { }
+ #pragma omp task depend(inout: deps[2])
+ { }
+ #pragma omp task depend(inout: deps[3])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1], deps[2], deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_20.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_20.cpp
new file mode 100644
index 0000000000000..ab3b42995a903
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_20.cpp
@@ -0,0 +1,48 @@
+// RUN: %libomp-cxx-compile && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+int main()
+{
+ int deps[3];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0])
+ {
+ fprintf(stderr, "task 0\n");
+ }
+ #pragma omp task depend(out: deps[1])
+ {
+ fprintf(stderr, "task 1\n");
+ }
+ #pragma omp task depend(out: deps[2])
+ {
+ fprintf(stderr, "task 2\n");
+ }
+ #pragma omp taskwait depend(inoutset: deps[0], deps[1])
+ #pragma omp task depend(in: deps[0], deps[1], deps[2])
+ {
+ fprintf(stderr, "task 3\n");
+ }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK-DAG: task 0
+// CHECK-DAG: task 1
+// CHECK-DAG: task 2
+// CHECK: task 3
+
+// CHECK-DAG: task 0
+// CHECK-DAG: task 1
+// CHECK-DAG: task 2
+// CHECK: task 3
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_21.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_21.cpp
new file mode 100644
index 0000000000000..ad36c8c5a4019
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_21.cpp
@@ -0,0 +1,49 @@
+// RUN: %libomp-cxx-compile && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+int main()
+{
+ int arr[100];
+
+ int res = 0;
+ for (int i = 0; i < 100; i++) {
+ arr[i] = i;
+ res += i;
+ }
+ printf("base result: %d\n", res);
+
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ int res = 0;
+ #pragma omp taskgraph
+ {
+ #pragma omp taskloop reduction(+: res) num_tasks(10)
+ {
+ for (int j = 0; j < 100; j++) {
+ res += arr[j];
+ }
+ }
+ }
+ printf("reduction result: %d\n", res);
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: base result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
+// CHECK-NEXT: reduction result: 4950
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_22.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_22.cpp
new file mode 100644
index 0000000000000..254de8e3542c8
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_22.cpp
@@ -0,0 +1,67 @@
+// RUN: %clangXX %flags %openmp_flags -fopenmp-version=60 %s -o %t && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+void foo() {
+#pragma omp task replayable(1)
+ {
+ fprintf(stderr, "task outside lexical taskgraph\n");
+ }
+}
+
+int main()
+{
+ int arr[100];
+
+ int res = 0;
+ for (int i = 0; i < 100; i++) {
+ arr[i] = i;
+ res += i;
+ }
+ fprintf(stderr, "base result: %d\n", res);
+
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ int res = 0;
+ #pragma omp taskgraph
+ {
+ #pragma omp taskloop reduction(+: res) num_tasks(10)
+ {
+ for (int j = 0; j < 100; j++) {
+ res += arr[j];
+ }
+ }
+ foo();
+ }
+ fprintf(stderr, "reduction result: %d\n", res);
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: base result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: task outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_23.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_23.cpp
new file mode 100644
index 0000000000000..eb6930965da6b
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_23.cpp
@@ -0,0 +1,100 @@
+// RUN: %clangXX %flags %openmp_flags -fopenmp-version=60 %s -o %t && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+void foo() {
+ fprintf(stderr, "called function foo\n");
+#pragma omp taskloop replayable num_tasks(4)
+ {
+ for (int i = 0; i < 4; i++)
+ fprintf(stderr, "taskloop iter %d outside lexical taskgraph\n", i);
+ }
+}
+
+int main()
+{
+ int arr[100];
+
+ int res = 0;
+ for (int i = 0; i < 100; i++) {
+ arr[i] = i;
+ res += i;
+ }
+ fprintf(stderr, "base result: %d\n", res);
+
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ int res = 0;
+ #pragma omp taskgraph
+ {
+ #pragma omp taskloop reduction(+: res) num_tasks(10)
+ {
+ for (int j = 0; j < 100; j++) {
+ res += arr[j];
+ }
+ }
+ foo();
+ }
+ fprintf(stderr, "reduction result: %d\n", res);
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: base result: 4950
+// CHECK-NEXT: called function foo
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
+// CHECK-DAG: taskloop iter 0 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 1 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 2 outside lexical taskgraph
+// CHECK-DAG: taskloop iter 3 outside lexical taskgraph
+// CHECK-DAG: reduction result: 4950
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_24.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_24.cpp
new file mode 100644
index 0000000000000..c974a08520aee
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_24.cpp
@@ -0,0 +1,77 @@
+// RUN: %clangXX %flags %openmp_flags -fopenmp-version=60 %s -o %t && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+#include <cassert>
+
+int global_dep;
+
+void foo() {
+#pragma omp taskwait replayable(1) depend(in: global_dep)
+}
+
+int main()
+{
+ int arr[100];
+
+ int res = 0;
+ for (int i = 0; i < 100; i++) {
+ arr[i] = i;
+ res += i;
+ }
+
+ assert(res == 4950);
+
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ int res = 0;
+ #pragma omp taskgraph
+ {
+ #pragma omp taskloop reduction(+: res) num_tasks(10)
+ {
+ for (int j = 0; j < 100; j++) {
+ res += arr[j];
+ }
+ }
+ #pragma omp task depend(out: global_dep)
+ { }
+ foo();
+ }
+ assert(res == 4950);
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: wait: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: wait: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_25.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_25.cpp
new file mode 100644
index 0000000000000..0f2c3fbf9b454
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_25.cpp
@@ -0,0 +1,86 @@
+// RUN: %clangXX %flags %openmp_flags -fopenmp-version=60 %s -o %t && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+int global_dep;
+
+void foo() {
+ fprintf(stderr, "called function foo\n");
+#pragma omp task replayable(1) depend(in: global_dep)
+ {
+ fprintf(stderr, "out-of-line task created from within taskloop\n");
+ }
+}
+
+int main()
+{
+ int arr[100];
+
+ int res = 0;
+ for (int i = 0; i < 4; i++) {
+ arr[i] = i;
+ res += i;
+ }
+ fprintf(stderr, "base result: %d\n", res);
+
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 4; i++)
+ {
+ int res = 0;
+ #pragma omp taskgraph
+ {
+ #pragma omp taskloop reduction(+: res) num_tasks(4)
+ {
+ for (int j = 0; j < 4; j++) {
+ res += arr[j];
+ foo();
+ }
+ }
+ }
+ fprintf(stderr, "reduction result: %d\n", res);
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: base result: 6
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: reduction result: 6
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: reduction result: 6
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: reduction result: 6
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: called function foo
+// CHECK-DAG: out-of-line task created from within taskloop
+// CHECK-DAG: reduction result: 6
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_26.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_26.cpp
new file mode 100644
index 0000000000000..86c69e8134a9f
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_26.cpp
@@ -0,0 +1,58 @@
+// RUN: %clangXX %flags %openmp_flags -fopenmp-version=60 %s -o %t && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+int main()
+{
+ int arr[100];
+ int arr2[100];
+
+ int res = 0, res2 = 0;
+ for (int i = 0; i < 10; i++) {
+ arr[i] = i;
+ arr2[i] = 3 + i * 2;
+ res += i;
+ res2 += 3 + i * 2;
+ }
+ fprintf(stderr, "base results: %d, %d\n", res, res2);
+
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ int res = 0, res2 = 0;
+ #pragma omp taskgraph
+ {
+ #pragma omp taskloop reduction(+: res) num_tasks(10)
+ {
+ for (int j = 0; j < 10; j++) {
+ res += arr[j];
+ }
+ }
+ #pragma omp taskloop reduction(+: res2) num_tasks(10)
+ {
+ for (int j = 0; j < 10; j++) {
+ res2 += arr2[j];
+ }
+ }
+ }
+ fprintf(stderr, "reduction results: %d, %d\n", res, res2);
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: base results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
+// CHECK-NEXT: reduction results: 45, 120
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_27.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_27.cpp
new file mode 100644
index 0000000000000..20b81b143c6b4
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_27.cpp
@@ -0,0 +1,60 @@
+// RUN: %clangXX %flags %openmp_flags -fopenmp-version=60 %s -o %t && %libomp-run 2>&1 | FileCheck %s
+
+#include <cstdio>
+
+int main()
+{
+ int arr[100];
+ int arr2[100];
+
+ int res = 0, res2 = 0;
+ for (int i = 0; i < 10; i++) {
+ arr[i] = i;
+ res += i;
+ }
+ for (int i = 0; i < 10; i++) {
+ arr2[i] = 3 + i * 2;
+ res2 += res * (3 + i * 2);
+ }
+ fprintf(stderr, "base results: %d, %d\n", res, res2);
+
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 10; i++)
+ {
+ int res = 0, res2 = 0;
+ #pragma omp taskgraph
+ {
+ #pragma omp taskloop reduction(+: res) num_tasks(10)
+ {
+ for (int j = 0; j < 10; j++) {
+ res += arr[j];
+ }
+ }
+ #pragma omp taskloop reduction(+: res2) num_tasks(10)
+ {
+ for (int j = 0; j < 10; j++) {
+ res2 += res * arr2[j];
+ }
+ }
+ }
+ fprintf(stderr, "reduction results: %d, %d\n", res, res2);
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: base results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
+// CHECK-NEXT: reduction results: 45, 5400
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_3.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_3.cpp
new file mode 100644
index 0000000000000..3686772695e91
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_3.cpp
@@ -0,0 +1,77 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[6];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(out: deps[2], deps[3])
+ { }
+ #pragma omp task depend(inout: deps[0])
+ { }
+ #pragma omp task depend(inout: deps[1])
+ { }
+ #pragma omp task depend(inout: deps[2])
+ { }
+ #pragma omp task depend(inout: deps[3])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1], deps[2], deps[3])
+ { }
+ #pragma omp task depend(in: deps[1], deps[2]) depend(out: deps[5])
+ { }
+ #pragma omp task depend(in: deps[5])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#%x,NODE1:]] (* 2)
+// CHECK-NEXT: node: 0x[[#%x,NODE2:]] (* 2)
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#%x,NODE3:]] (* 2)
+// CHECK-NEXT: node: 0x[[#%x,NODE4:]] (* 2)
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#NODE3]] (* 2)
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x[[#NODE4]] (* 2)
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#NODE1]] (* 2)
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x[[#NODE2]] (* 2)
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_4.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_4.cpp
new file mode 100644
index 0000000000000..a70fed4845f5b
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_4.cpp
@@ -0,0 +1,73 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(out: deps[2], deps[3])
+ { }
+ #pragma omp task depend(inout: deps[0])
+ { }
+ #pragma omp task depend(inout: deps[1])
+ { }
+ #pragma omp task depend(inout: deps[2])
+ { }
+ #pragma omp task depend(inout: deps[3])
+ { }
+ #pragma omp task depend(in: deps[0], deps[2], deps[3])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1])
+ { }
+ #pragma omp task depend(in: deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#%x,NODE1:]] (* 2)
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x[[#%x,NODE2:]] (* 2)
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#%x,NODE3:]] (* 2)
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x[[#%x,NODE4:]] (* 2)
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#NODE1]] (* 2)
+// CHECK-NEXT: node: 0x[[#NODE2]] (* 2)
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x[[#NODE3]] (* 2)
+// CHECK-NEXT: node: 0x[[#NODE4]] (* 2)
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_5.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_5.cpp
new file mode 100644
index 0000000000000..636208245fd0d
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_5.cpp
@@ -0,0 +1,60 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[2])
+ { }
+ #pragma omp task depend(out: deps[1], deps[3])
+ { }
+ #pragma omp task depend(inoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(inoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(inoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(inoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1])
+ { }
+ #pragma omp task depend(in: deps[2], deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_6.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_6.cpp
new file mode 100644
index 0000000000000..66e872f833793
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_6.cpp
@@ -0,0 +1,56 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[1], deps[3])
+ { }
+ #pragma omp task depend(out: deps[0], deps[2])
+ { }
+ #pragma omp task depend(inoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(inoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(inoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(inoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(in: deps[0], deps[2])
+ { }
+ #pragma omp task depend(in: deps[0], deps[2])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_7.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_7.cpp
new file mode 100644
index 0000000000000..c01d4a080385d
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_7.cpp
@@ -0,0 +1,56 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[4];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(out: deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[2], deps[3])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1])
+ { }
+ #pragma omp task depend(in: deps[2], deps[3])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: parallel {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: exclusive {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_8.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_8.cpp
new file mode 100644
index 0000000000000..179e8bb087801
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_8.cpp
@@ -0,0 +1,36 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[2];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// FIXME: This isn't perfect -- we don't really need to keep the mutexes in
+// this case.
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x3]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
diff --git a/openmp/runtime/test/taskgraph/taskgraph_deps_9.cpp b/openmp/runtime/test/taskgraph/taskgraph_deps_9.cpp
new file mode 100644
index 0000000000000..7e9af09dbe7de
--- /dev/null
+++ b/openmp/runtime/test/taskgraph/taskgraph_deps_9.cpp
@@ -0,0 +1,44 @@
+// RUN: %libomp-cxx-compile && env KMP_G_DEBUG=10 %libomp-run 2>&1 | FileCheck %s
+
+int main()
+{
+ int deps[3];
+ #pragma omp parallel
+ {
+ #pragma omp single
+ {
+ for (int i = 0; i < 2; i++)
+ {
+ #pragma omp taskgraph
+ {
+ #pragma omp task depend(out: deps[0], deps[1])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(in: deps[1]) depend(out: deps[2])
+ { }
+ #pragma omp task depend(mutexinoutset: deps[0], deps[1])
+ { }
+ #pragma omp task depend(in: deps[0], deps[1], deps[2])
+ { }
+ }
+ }
+ }
+ }
+ return 0;
+}
+
+// FIXME: This isn't perfect -- we don't really need to keep the mutexes in
+// this case.
+
+// CHECK: Processed taskgraph 0x[[#%x,GRAPHPTR:]] (graph_id 0):
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: sequential {
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x3]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}} [sets: 0x3]
+// CHECK-NEXT: node: 0x{{[[:xdigit:]]+}}
+// CHECK-NEXT: }
+// CHECK-NEXT: Replay taskgraph 0x[[#GRAPHPTR]] from task 0x{{[[:xdigit:]]+}}
More information about the cfe-commits
mailing list