<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<meta content="text/html; charset=iso-8859-1">
</head>
<body>
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style>
<!--
@font-face
{font-family:"Cambria Math"}
@font-face
{font-family:Calibri}
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
{color:#954F72;
text-decoration:underline}
tt
{font-family:"Courier New"}
.MsoChpDefault
{}
@page WordSection1
{margin:70.85pt 70.85pt 2.0cm 70.85pt}
div.WordSection1
{}
-->
</style>
<div class="WordSection1">
<p class="MsoNormal">Thank you for your feedback, Arpith.</p>
<p class="MsoNormal">I see that the nvptx codegen relies in parts on combined directives and understand the possible difficulties.</p>
<p class="MsoNormal">However, codegen is already able to do this with the target and teams directive. Remember that the split directives are immediate child nodes in the AST.</p>
<p class="MsoNormal">But you are right, this change can only be made after changes to codegen.</p>
<p class="MsoNormal">In my opinion the code generation should in general be able to handle both cases equivally, combined directives and the separated version.
</p>
<p class="MsoNormal">The reason is that the user would expect the same outcome/performance from both, e.g.</p>
<p class="MsoNormal">-> #pragma omp target parallel for</p>
<p class="MsoNormal">as well as</p>
<p class="MsoNormal">-> #pragma omp target</p>
<p class="MsoNormal"> #pragma omp parallel</p>
<p class="MsoNormal"> #pragma omp for</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">like the spec states. (The only exception seem to be the distribute- combined directives.</p>
<p class="MsoNormal"> </p>
<p class="MsoNormal">Kind regards,</p>
<p class="MsoNormal">Daniel</p>
<p class="MsoNormal"> </p>
<div style="border:none; border-top:solid #E1E1E1 1.0pt; padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="border:none; padding:0cm"><b>Von: </b><a href="mailto:acjacob@us.ibm.com">Arpith C Jacob</a><br>
<b>Gesendet: </b>Freitag, 2. Juni 2017 20:24<br>
<b>An: </b><a href="mailto:daniel.schuermann@campus.tu-berlin.de">Schürmann, Daniel</a><br>
<b>Cc: </b><a href="mailto:openmp-dev@lists.llvm.org">openmp-dev@lists.llvm.org</a><br>
<b>Betreff: </b>Re: [Openmp-dev] Proposal: Resolve combined directives in parsing phase</p>
</div>
<p class="MsoNormal"> </p>
</div>
<div>
<p><font size="2">Daniel,</font><br>
<br>
<font size="2">> </font><tt><font size="2">Cons: - Code optimizations for combined directives may be harder to implement</font></tt><br>
<br>
<font size="2">From the perspective of GPU code generation treating combined directives as a special case is *immensely* important for performance. Knowing that there is *no* serial section in the target region makes it possible to generate simplified and low
overhead code that resembles typical CUDA kernels. I am worried that splitting combined directives will make it much harder to get this performance back, particularly because we must do OpenMP lowering in Clang.</font><br>
<br>
<font size="2">> </font><tt><font size="2">the code generation is implemented as inlined function what results in
<br>
> ignoring the semantic meaning of these directives.<br>
</font></tt><br>
<font size="2">Can the current code generation be altered to fix the issues that you see?</font><br>
<br>
<tt><font size="2">> Unrelated question:<br>
> I don't understand the necessity of the __kmpc_fork_teams() run-time <br>
> call as the __tgt_target_teams() implementation should be able to handle <br>
> this case.<br>
</font></tt><br>
<font size="2">My understanding is that the __tgt* calls are implemented in the target offload library while the __kmpc* calls are in the OpenMP runtime. On CPUs, forking of teams is done in the __kmpc_fork_teams() call. On the GPU, the offload call __tgt_target_teams()
launches a kernel with multiple teams so the __kmpc_fork_teams() is a no-op.</font><br>
<br>
<font size="2">Regards,</font><br>
<font size="2">Arpith</font><br>
<br>
<img width="16" height="16" src="cid:1__=8FBB0BA0DFF0A3158f9e8a93df938690918c8FB@" border="0" alt="Inactive hide details for Daniel Schürmann via Openmp-dev ---06/02/2017 09:08:11 AM---At the moment, combined directives have t"><font size="2" color="#424282">Daniel
Schürmann via Openmp-dev ---06/02/2017 09:08:11 AM---At the moment, combined directives have their own ast representation for type-checking and code gen</font><br>
<br>
<font size="2" color="#5F5F5F">From: </font><font size="2">Daniel Schürmann via Openmp-dev <openmp-dev@lists.llvm.org></font><br>
<font size="2" color="#5F5F5F">To: </font><font size="2"><openmp-dev@lists.llvm.org></font><br>
<font size="2" color="#5F5F5F">Date: </font><font size="2">06/02/2017 09:08 AM</font><br>
<font size="2" color="#5F5F5F">Subject: </font><font size="2">[Openmp-dev] Proposal: Resolve combined directives in parsing phase</font><br>
<font size="2" color="#5F5F5F">Sent by: </font><font size="2">"Openmp-dev" <openmp-dev-bounces@lists.llvm.org></font><br>
</p>
<hr width="100%" size="2" align="left" noshade="" style="color:#8091A5">
<br>
<br>
<br>
<tt><font size="2">At the moment, combined directives have their own ast representation for
<br>
type-checking and code generation. For some of the combined constructs, <br>
the code generation is implemented as inlined function what results in <br>
ignoring the semantic meaning of these directives.<br>
<br>
This is true for e.g.<br>
EmitOMPTargetParallelForSimdDirective<br>
EmitOMPTargetSimdDirective<br>
EmitOMPTeamsDistributeDirective<br>
EmitOMPTargetTeamsDistributeDirective<br>
EmitOMPTargetTeamsDistributeParallelForDirective<br>
and more<br>
<br>
One solution would be the proper codegen implementation for these <br>
directives.<br>
However, I would like to propose a simpler and closer-to-spec approach:<br>
By resolving combined directives in the parsing phase into nested AST nodes.<br>
<br>
E.g. an OMPTargetTeamsDistributeDirective would be resolved into<br>
OMPTargetDirective<br>
|- OMPTeamsDirective<br>
|- OMPDistributeDirective<br>
<br>
whereas type-checking and codegen for these single directives is already <br>
implemented.<br>
The advantages are:<br>
- Much simpler type-checking and code generation<br>
- We match the specification stating that combined directives have the <br>
semantic meaning of one construct immediately followed by the other <br>
construct<br>
- All combined directives are fully supported if their derived <br>
constructs are supported<br>
<br>
Potential disadvantages:<br>
- The AST representation differs from the input. However, this is <br>
already the case due to inserted implicit parameters.<br>
- Code optimizations for combined directives may be harder to implement<br>
<br>
In my opinion the benefits outweigh the disadvantages, but I may not be <br>
aware of some implications. Please let me know your thoughts about this <br>
idea. And tell me if I missunderstood anything related that led to the <br>
decision for the actual design.<br>
<br>
Unrelated question:<br>
I don't understand the necessity of the __kmpc_fork_teams() run-time <br>
call as the __tgt_target_teams() implementation should be able to handle <br>
this case.<br>
<br>
<br>
Daniel<br>
_______________________________________________<br>
Openmp-dev mailing list<br>
Openmp-dev@lists.llvm.org<br>
</font></tt><tt><font size="2"><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a></font></tt><tt><font size="2"><br>
</font></tt><br>
<br>
<br>
</div>
</body>
</html>