[PATCH] D17019: [OpenMP] Code generation for teams - kernel launching

Samuel Antao via cfe-commits cfe-commits at lists.llvm.org
Wed Feb 17 12:09:28 PST 2016


sfantao added a comment.

Hi Alexey,

Thanks for the review!


================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3811-3818
@@ -3789,1 +3810,10 @@
       DeviceID, FileID, ParentName, Line, Column, OutlinedFn, OutlinedFnID);
+
+  // If the current target region has a teams region enclosed, we need to get
+  // the number of teams and thread limit to pass to the runtime function call
+  // later on. This is done through a function that returns the value. This is
+  // required because the expression is captured in the enclosing target
+  // environment when the teams directive is not combined with target. This only
+  // has to be done for the host.
+  //
+  // FIXME: Accommodate other combined directives with teams when they become
----------------
ABataev wrote:
> It is better to use OMPCapturedExprDecl for this, just like it is done for schedule clause
I don't think that would completel solve the problem in this case. In my understanding the problem I have here is slightly differetn than the one `OMPCapturedExprDecl` attempts to solve:

I have a clause (num_teams/thread_limit) that is part of an enclosed directive (teams) that I need to emit in the outer scope (target). If I create a OMPCapturedExprDecl, that would have to go with some dummy clause for the target  so that the initializer is emitted at the target lexical scope, and that emission would only work because in most directives the captures are local variables of the enclosing scope, and emission on the locals takes precedence over declaration that "refer to enclosing capture". However, target directive is special in the sense that it also captures global variables. So if I use OMPCapturedExprDecl on a expression that refers to globals that will cause a crash during the emission of the initializer because the capture of the target directive was not created yet. The patch has regression tests exactly to test this subtle difference in the target directive.

I am not saying that there are no other ways of doing this, but this approach seemed to me as the least disruptive as it is self-contained in the target codegen.

Let me know if you disagree.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:3822-3857
@@ -3790,1 +3821,38 @@
+  if (!CGM.getLangOpts().OpenMPIsDevice)
+    if (auto *TeamsDir = dyn_cast<OMPTeamsDirective>(CS.getCapturedStmt())) {
+      if (auto *NTE = TeamsDir->getSingleClause<OMPNumTeamsClause>()) {
+        auto &&CodeGen = [NTE](CodeGenFunction &CGF) {
+          auto *V = CGF.EmitScalarExpr(NTE->getNumTeams());
+          CGF.Builder.CreateRet(
+              CGF.Builder.CreateIntCast(V, CGF.Int32Ty, /*isSigned=*/true));
+          CGF.EmitBlock(CGF.createBasicBlock());
+        };
+
+        CodeGenFunction CGF(CGM, true);
+        CGOpenMPTargetRegionInfo CGInfo(CS, CodeGen,
+                                        ".omp_offload.get_num_teams");
+        CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);
+
+        NestedNumTeamsFn =
+            CGF.GenerateOpenMPCapturedStmtFunction(CS, CGM.getContext().IntTy);
+        NestedNumTeamsFn->addFnAttr(llvm::Attribute::AlwaysInline);
+      }
+      if (auto *TLE = TeamsDir->getSingleClause<OMPThreadLimitClause>()) {
+        auto &&CodeGen = [TLE](CodeGenFunction &CGF) {
+          auto *V = CGF.EmitScalarExpr(TLE->getThreadLimit());
+          CGF.Builder.CreateRet(
+              CGF.Builder.CreateIntCast(V, CGF.Int32Ty, /*isSigned=*/true));
+          CGF.EmitBlock(CGF.createBasicBlock());
+        };
+
+        CodeGenFunction CGF(CGM, true);
+        CGOpenMPTargetRegionInfo CGInfo(CS, CodeGen,
+                                        ".omp_offload.get_thread_limit");
+        CodeGenFunction::CGCapturedStmtRAII CapInfoRAII(CGF, &CGInfo);
+
+        NestedThreadLimitFn =
+            CGF.GenerateOpenMPCapturedStmtFunction(CS, CGM.getContext().IntTy);
+        NestedThreadLimitFn->addFnAttr(llvm::Attribute::AlwaysInline);
+      }
+    }
   return;
----------------
ABataev wrote:
> Please, do it in separate functions
I centralized the emission of num_teams and thread_limit in a function.

================
Comment at: lib/CodeGen/CGOpenMPRuntime.cpp:4148-4161
@@ +4147,16 @@
+      llvm::Value *ThreadLimit = nullptr;
+      if (TD->getSingleClause<OMPNumTeamsClause>()) {
+        assert(NestedNumTeamsFn && "Helper function is required to get the "
+                                   "number of teams of an enclosed teams "
+                                   "directive.");
+        NumTeams = CGF.Builder.CreateCall(NestedNumTeamsFn, BasePointers);
+      } else
+        NumTeams = CGF.Builder.getInt32(0);
+      if (TD->getSingleClause<OMPThreadLimitClause>()) {
+        assert(NestedThreadLimitFn && "Helper function is required to get the "
+                                      "thread limit of an enclosed teams "
+                                      "directive.");
+        ThreadLimit = CGF.Builder.CreateCall(NestedThreadLimitFn, BasePointers);
+      } else
+        ThreadLimit = CGF.Builder.getInt32(0);
+
----------------
ABataev wrote:
> Again, all this must be done in separate functions
I am now doing the emission of num_teams and thread_limit in a separate function.


http://reviews.llvm.org/D17019





More information about the cfe-commits mailing list