[PATCH] D152054: [OpenMP] Codegen support for thread_limit on target directive

Sandeep via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 30 22:13:06 PDT 2023


sandeepkosuri added inline comments.


================
Comment at: clang/lib/CodeGen/CGOpenMPRuntime.cpp:9818
+      D.hasClausesOfKind<OMPInReductionClause>() ||
+      (CGM.getLangOpts().OpenMP >= 51 && D.getDirectiveKind() == OMPD_target &&
+       D.hasClausesOfKind<OMPThreadLimitClause>());
----------------
ABataev wrote:
> What if D is combined target directive, i.e. D.getDirectiveKind() is something like OMPD_target_teams, etc.?
I will fix that, thanks for noticing.


================
Comment at: clang/lib/CodeGen/CGStmtOpenMP.cpp:5143-5148
+        S.getSingleClause<OMPThreadLimitClause>()) {
+      // Emit __kmpc_set_thread_limit() to set the thread_limit for the task
+      // enclosing this target region. This will indirectly set the thread_limit
+      // for every applicable construct within target region.
+      CGF.CGM.getOpenMPRuntime().emitThreadLimitClause(
+          CGF, S.getSingleClause<OMPThreadLimitClause>()->getThreadLimit(),
----------------
ABataev wrote:
> Avoid double call of S.getSingleClause<OMPThreadLimitClause>(), store in local variable call result.
sure.


================
Comment at: clang/test/OpenMP/target_codegen.cpp:849
 // OMP51: [[CE:%.*]] = load {{.*}} [[CEA]]
-// OMP51: call i32 @__tgt_target_kernel({{.*}}, i64 -1, i32 -1, i32 [[CE]],
+// OMP51: call ptr @__kmpc_omp_task_alloc({{.*@.omp_task_entry.*}})
+// OMP51: call i32 [[OMP_TASK_ENTRY]]
----------------
ABataev wrote:
> It requires extra resource consumption, can you try to avoid creating outer task, if possible?
I tried different ideas for making `thread_limit` work on `target`.

I tried to reuse the existing implementation by replacing the directive to `target teams(1) thread_limit(x)` at  parsing , sema and IR stages. I couldn't successfully implement any of them. So, I tried adding `num_threads` for all the parallel directives within `target`, and there were corner cases like parallel directives in a function which is called in target region, which were becoming tedious to handle.

This method seem to encompass the idea of thread limit on `target` pretty well and also works... So I proceeded with this idea.


================
Comment at: openmp/runtime/src/kmp_ftn_entry.h:809
+    return thread_limit;
+  else
+    return thread->th.th_current_task->td_icvs.thread_limit;
----------------
ABataev wrote:
> No need for else here
oops, I will fix that


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152054/new/

https://reviews.llvm.org/D152054



More information about the cfe-commits mailing list