<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<meta content="text/html; charset=iso-8859-1">

</head>

<body>

<meta name="Generator" content="Microsoft Word 15 (filtered medium)">

<style>

<!--

@font-face

        {font-family:"Cambria Math"}

@font-face

        {font-family:Calibri}

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0cm;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif}

a:link, span.MsoHyperlink

        {color:blue;

        text-decoration:underline}

a:visited, span.MsoHyperlinkFollowed

        {color:#954F72;

        text-decoration:underline}

tt

        {font-family:"Courier New"}

.MsoChpDefault

        {}

@page WordSection1

        {margin:70.85pt 70.85pt 2.0cm 70.85pt}

div.WordSection1

        {}

-->

</style>

<div class="WordSection1">

<p class="MsoNormal">Thank you for your feedback, Arpith.</p>

<p class="MsoNormal">I see that the nvptx codegen relies in parts on combined directives and understand the possible difficulties.</p>

<p class="MsoNormal">However, codegen is already able to do this with the target and teams directive. Remember that the split directives are immediate child nodes in the AST.</p>

<p class="MsoNormal">But you are right, this change can only be made after changes to codegen.</p>

<p class="MsoNormal">In my opinion the code generation should in general be able to handle both cases equivally, combined directives and the separated version.

</p>

<p class="MsoNormal">The reason is that the user would expect the same outcome/performance from both, e.g.</p>

<p class="MsoNormal">-> #pragma omp target parallel for</p>

<p class="MsoNormal">as well as</p>

<p class="MsoNormal">-> #pragma omp target</p>

<p class="MsoNormal">   #pragma omp parallel</p>

<p class="MsoNormal">   #pragma omp for</p>

<p class="MsoNormal"> </p>

<p class="MsoNormal">like the spec states. (The only exception seem to be the distribute- combined directives.</p>

<p class="MsoNormal"> </p>

<p class="MsoNormal">Kind regards,</p>

<p class="MsoNormal">Daniel</p>

<p class="MsoNormal"> </p>

<div style="border:none; border-top:solid #E1E1E1 1.0pt; padding:3.0pt 0cm 0cm 0cm">

<p class="MsoNormal" style="border:none; padding:0cm"><b>Von: </b><a href="mailto:acjacob@us.ibm.com">Arpith C Jacob</a><br>

<b>Gesendet: </b>Freitag, 2. Juni 2017 20:24<br>

<b>An: </b><a href="mailto:daniel.schuermann@campus.tu-berlin.de">Schürmann, Daniel</a><br>

<b>Cc: </b><a href="mailto:openmp-dev@lists.llvm.org">openmp-dev@lists.llvm.org</a><br>

<b>Betreff: </b>Re: [Openmp-dev] Proposal: Resolve combined directives in parsing phase</p>

</div>

<p class="MsoNormal"> </p>

</div>

<div>

<p><font size="2">Daniel,</font><br>

<br>

<font size="2">> </font><tt><font size="2">Cons: - Code optimizations for combined directives may be harder to implement</font></tt><br>

<br>

<font size="2">From the perspective of GPU code generation treating combined directives as a special case is *immensely* important for performance. Knowing that there is *no* serial section in the target region makes it possible to generate simplified and low

 overhead code that resembles typical CUDA kernels. I am worried that splitting combined directives will make it much harder to get this performance back, particularly because we must do OpenMP lowering in Clang.</font><br>

<br>

<font size="2">> </font><tt><font size="2">the code generation is implemented as inlined function what results in

<br>

> ignoring the semantic meaning of these directives.<br>

</font></tt><br>

<font size="2">Can the current code generation be altered to fix the issues that you see?</font><br>

<br>

<tt><font size="2">> Unrelated question:<br>

> I don't understand the necessity of the __kmpc_fork_teams() run-time <br>

> call as the __tgt_target_teams() implementation should be able to handle <br>

> this case.<br>

</font></tt><br>

<font size="2">My understanding is that the __tgt* calls are implemented in the target offload library while the __kmpc* calls are in the OpenMP runtime. On CPUs, forking of teams is done in the __kmpc_fork_teams() call. On the GPU, the offload call __tgt_target_teams()

 launches a kernel with multiple teams so the __kmpc_fork_teams() is a no-op.</font><br>

<br>

<font size="2">Regards,</font><br>

<font size="2">Arpith</font><br>

<br>

<img width="16" height="16" src="cid:1__=8FBB0BA0DFF0A3158f9e8a93df938690918c8FB@" border="0" alt="Inactive hide details for Daniel Schürmann via Openmp-dev ---06/02/2017 09:08:11 AM---At the moment, combined directives have t"><font size="2" color="#424282">Daniel

 Schürmann via Openmp-dev ---06/02/2017 09:08:11 AM---At the moment, combined directives have their own ast representation for type-checking and code gen</font><br>

<br>

<font size="2" color="#5F5F5F">From: </font><font size="2">Daniel Schürmann via Openmp-dev <openmp-dev@lists.llvm.org></font><br>

<font size="2" color="#5F5F5F">To: </font><font size="2"><openmp-dev@lists.llvm.org></font><br>

<font size="2" color="#5F5F5F">Date: </font><font size="2">06/02/2017 09:08 AM</font><br>

<font size="2" color="#5F5F5F">Subject: </font><font size="2">[Openmp-dev] Proposal: Resolve combined directives in parsing phase</font><br>

<font size="2" color="#5F5F5F">Sent by: </font><font size="2">"Openmp-dev" <openmp-dev-bounces@lists.llvm.org></font><br>

</p>

<hr width="100%" size="2" align="left" noshade="" style="color:#8091A5">

<br>

<br>

<br>

<tt><font size="2">At the moment, combined directives have their own ast representation for

<br>

type-checking and code generation. For some of the combined constructs, <br>

the code generation is implemented as inlined function what results in <br>

ignoring the semantic meaning of these directives.<br>

<br>

This is true for e.g.<br>

EmitOMPTargetParallelForSimdDirective<br>

EmitOMPTargetSimdDirective<br>

EmitOMPTeamsDistributeDirective<br>

EmitOMPTargetTeamsDistributeDirective<br>

EmitOMPTargetTeamsDistributeParallelForDirective<br>

and more<br>

<br>

One solution would be the proper codegen implementation for these <br>

directives.<br>

However, I would like to propose a simpler and closer-to-spec approach:<br>

By resolving combined directives in the parsing phase into nested AST nodes.<br>

<br>

E.g. an OMPTargetTeamsDistributeDirective would be resolved into<br>

OMPTargetDirective<br>

    |- OMPTeamsDirective<br>

        |- OMPDistributeDirective<br>

<br>

whereas type-checking and codegen for these single directives is already <br>

implemented.<br>

The advantages are:<br>

- Much simpler type-checking and code generation<br>

- We match the specification stating that combined directives have the <br>

semantic meaning of one construct immediately followed by the other <br>

construct<br>

- All combined directives are fully supported if their derived <br>

constructs are supported<br>

<br>

Potential disadvantages:<br>

- The AST representation differs from the input. However, this is <br>

already the case due to inserted implicit parameters.<br>

- Code optimizations for combined directives may be harder to implement<br>

<br>

In my opinion the benefits outweigh the disadvantages, but I may not be <br>

aware of some implications. Please let me know your thoughts about this <br>

idea. And tell me if I missunderstood anything related that led to the <br>

decision for the actual design.<br>

<br>

Unrelated question:<br>

I don't understand the necessity of the __kmpc_fork_teams() run-time <br>

call as the __tgt_target_teams() implementation should be able to handle <br>

this case.<br>

<br>

<br>

Daniel<br>

_______________________________________________<br>

Openmp-dev mailing list<br>

Openmp-dev@lists.llvm.org<br>

</font></tt><tt><font size="2"><a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev">http://lists.llvm.org/cgi-bin/mailman/listinfo/openmp-dev</a></font></tt><tt><font size="2"><br>

</font></tt><br>

<br>

<br>

</div>

</body>

</html>