<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body alink="#EE0000" bgcolor="#ffffff" link="#0B6CDA" text="#000000"
vlink="#551A8B">
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">I’m
not quite sure what you’re saying here; are you saying that
there should be an unnecessary barrier in the
</span><span style="font-size:10.0pt;font-family:"Lucida
Console";color:#1F497D;mso-fareast-language:EN-US">omp
parallel do/for
</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">?<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">If
so I disagree.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Or
are you saying that the compiler should optimise
</span><span style="font-size:10.0pt;font-family:"Lucida
Console";color:#1F497D;mso-fareast-language:EN-US">omp
parallel; {omp do/for}</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">
to remove the unnecessary barrier?<o:p></o:p></span></p>
<span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">In
which case I agree.<br>
<br>
<font color="#000000">I meant the latter. Maybe "semantic meaning"
isn't the right term as optimizations preserve semantics anyway.
However, if we merge these two cases in the AST construction and
optimize away the unnecessary barrier, we gain easier Codegen
with same performance for both cases:<br>
- decompose "omp parallel for" into "omp parallel; {omp do/for}"<br>
- check for closely nested omp for (we could also do this more
generic I think)<br>
- in Codegen add barrier only if no omp for is closely nested<br>
<br>
I think this could be applicable to more if not all combined
directives.<br>
<br>
Kind regards<br>
Daniel<br>
</font><br>
<br>
</span>
<div class="moz-cite-prefix">On 06/09/2017 02:45 PM, Cownie, James H
wrote:<br>
</div>
<blockquote
cite="mid:397D95928DECEF49983F5B237627E9788AD38C7F@IRSMSX154.ger.corp.intel.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Verdana;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"Lucida Console";
panose-1:2 11 6 9 4 5 4 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.emailquote, li.emailquote, div.emailquote
{mso-style-name:emailquote;
mso-margin-top-alt:auto;
margin-right:0cm;
mso-margin-bottom-alt:auto;
margin-left:1.0pt;
border:none;
padding:0cm;
font-size:12.0pt;
font-family:"Times New Roman",serif;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
{mso-style-name:x_msonormal;
margin:0cm;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.xmsohyperlink
{mso-style-name:x_msohyperlink;
color:blue;
text-decoration:underline;}
span.xmsohyperlinkfollowed
{mso-style-name:x_msohyperlinkfollowed;
color:#954F72;
text-decoration:underline;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Verdana",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="xmsonormal"><span lang="DE">Jim:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">> <i>Thus it can
easily be the case that omp parallel do/for is faster than
omp parallel + omp do/for.</i><o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">This is another good
motivation for this proposal as I think, it is but should
not be the case.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Btw, thank you for this
very good example and provided solution. Question is, if we
can resolve all combined constructs that easily.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">I’m
not quite sure what you’re saying here; are you saying that
there should be an unnecessary barrier in the
</span><span style="font-size:10.0pt;font-family:"Lucida
Console";color:#1F497D;mso-fareast-language:EN-US">omp
parallel do/for
</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">?<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">If
so I disagree.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Or
are you saying that the compiler should optimise
</span><span style="font-size:10.0pt;font-family:"Lucida
Console";color:#1F497D;mso-fareast-language:EN-US">omp
parallel; {omp do/for}</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">
to remove the unnecessary barrier?<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">In
which case I agree.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Like
many standards, OpenMP is all predicated by “as if”, so the
standard lays down the user-visible behaviour, and any
implementation which provides that is fine. The unnecessary
barriers implied by the simple transformation of</span><span
style="font-size:10.0pt;font-family:"Lucida
Console";color:#1F497D;mso-fareast-language:EN-US"> omp
parallel do/for => omp parallel; {omp do/for}
</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">are
not user visible and can be removed by the implementation.</span><span
style="font-size:10.0pt;font-family:"Lucida
Console";color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">You
may choose to note, in particular, that there is language in
TR4 that makes it clear that the OMPT profiling interface
cannot be used to check whether this unnecessary barrier is
present. In other words optimizations that are not visible
to user-code are not outlawed because you can see them by
using the OMPT profiling interfaces.<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">--
Jim<br>
<br>
Jim Cownie <a class="moz-txt-link-rfc2396E" href="mailto:james.h.cownie@intel.com"><james.h.cownie@intel.com></a><br>
SSG/DPD/TCAR (Technical Computing, Analyzers, and
Runtimes)<o:p></o:p></span></p>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Tel:
+44 117 9071438</span><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
</div>
<p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US">From:</span></b><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif"
lang="EN-US"> Openmp-dev
[<a class="moz-txt-link-freetext" href="mailto:openmp-dev-bounces@lists.llvm.org">mailto:openmp-dev-bounces@lists.llvm.org</a>]
<b>On Behalf Of </b>Schürmann, Daniel via Openmp-dev<br>
<b>Sent:</b> Monday, June 5, 2017 5:20 PM<br>
<b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:openmp-dev@lists.llvm.org">openmp-dev@lists.llvm.org</a><br>
<b>Subject:</b> Re: [Openmp-dev] Proposal: Resolve
combined directives in parsing phase<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="xmsonormal"><span lang="DE">Thank you all for your
feedback and suggestions!<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">I would like to update
my proposal while taking your considerations into
account.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Also, I hope it is
okay to answer in one mail instead of spread out
discussions.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Briefly again the
motivation:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">- some combined
constructs are unhandled in the code generation.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">- codegen is very
cumbersome to match all directive combinations.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">- combined constructs
and separate nested constructs have potentially
different performance characteristics.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Section 2.11 of the
specification about Combined Constructs states:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">The semantics of the
combined constructs are identical to that of explicitly
specifying
<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">the first construct
containing one instance of the second construct and no
other statements.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">To match this semantic
rule, the idea is to expand these combined constructs
already in the AST construction. This enables
unimplemented combined constructs to use the already
implemented code generation. Simultaneously, it provides
same performance for combined constructs as separate
ones.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">After reconsidering
some implications, it seems easier to leave parsing and
type-checking as is and do the expansion in the AST
construction (Sema::ActOnOpenMPxyzDirective()).<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">This way, the AST
should look exactly the same whether the code contains
combined constructs or not. The issue of performance
regressions due to losing information about the close
nesting should be solvable by flags in cases where this
is really necessary. On the upside, it should be
possible to derive the close nesting information if the
constructs are previously not combined.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Now, I would like to
reply to some of the points raised:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">C Bergström:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">> <i>I'm not sure
the error handling on a parsing issue would cascade
like you expect.
</i><o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">This updated proposal
is taking this into account by delaying the expansion to
the AST construction.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Alexey Bataev:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">> <i>Also, you
will need to properly capture arguments of some of the
clauses that are used in inner OpenMP constructs.</i><o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Although I was more
concerned about clauses related to the outer constructs,
this is the main reason to better not do the expansion
in the parsing phase. In Sema, all clauses are parsed
and available. The clauses can be added to either both
constructs or have to be splitted. I'm not sure if
'wrong' clauses would do any harm later (e.g. a
num_teams clause added to a target construct).<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Jim:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">> <i>Thus it can
easily be the case that omp parallel do/for is faster
than omp parallel + omp do/for.</i><o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">This is another good
motivation for this proposal as I think, it is but
should not be the case.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Btw, thank you for
this very good example and provided solution. Question
is, if we can resolve all combined constructs that
easily.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Arpith:<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">> <i>The spec
guarantees that there can be no user code between the
target and the teams directive. This is not the case
with the other combined directives.</i><o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">I was a little bit
unspecific in my response. I meant that a close nesting,
if present, can also be derived. Might be that this is
easier for target teams combination, but we already use
the nesting information for typechecking.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">I know I'm proposing a
not-so-small rework, but I think the benefit could be a
cleaner implementation of the spec. As it is no urgent
request, we could also slowly work in this direction,
e.g. starting only with combined directives which remain
working the same or are broken anyway.<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Thanks again for
taking the time!<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Best regards,<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE">Daniel<o:p></o:p></span></p>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="xmsonormal"><b><span lang="DE">Von: </span></b><span
lang="DE"><a moz-do-not-send="true"
href="mailto:daniel.schuermann@campus.tu-berlin.de">Daniel
Schürmann</a><br>
<b>Gesendet: </b>Freitag, 2. Juni 2017 15:06<br>
<b>An: </b><a moz-do-not-send="true"
href="mailto:openmp-dev@lists.llvm.org">openmp-dev@lists.llvm.org</a><br>
<b>Betreff: </b>Proposal: Resolve combined directives
in parsing phase<o:p></o:p></span></p>
</div>
<p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.0pt">At the
moment, combined directives have their own ast
representation for
<br>
type-checking and code generation. For some of the
combined constructs, <br>
the code generation is implemented as inlined function
what results in <br>
ignoring the semantic meaning of these directives.<br>
<br>
This is true for e.g.<br>
EmitOMPTargetParallelForSimdDirective<br>
EmitOMPTargetSimdDirective<br>
EmitOMPTeamsDistributeDirective<br>
EmitOMPTargetTeamsDistributeDirective<br>
EmitOMPTargetTeamsDistributeParallelForDirective<br>
and more<br>
<br>
One solution would be the proper codegen implementation
for these <br>
directives.<br>
However, I would like to propose a simpler and
closer-to-spec approach:<br>
By resolving combined directives in the parsing phase into
nested AST nodes.<br>
<br>
E.g. an OMPTargetTeamsDistributeDirective would be
resolved into<br>
OMPTargetDirective<br>
|- OMPTeamsDirective<br>
|- OMPDistributeDirective<br>
<br>
whereas type-checking and codegen for these single
directives is already <br>
implemented.<br>
The advantages are:<br>
- Much simpler type-checking and code generation<br>
- We match the specification stating that combined
directives have the <br>
semantic meaning of one construct immediately followed by
the other <br>
construct<br>
- All combined directives are fully supported if their
derived <br>
constructs are supported<br>
<br>
Potential disadvantages:<br>
- The AST representation differs from the input. However,
this is <br>
already the case due to inserted implicit parameters.<br>
- Code optimizations for combined directives may be harder
to implement<br>
<br>
In my opinion the benefits outweigh the disadvantages, but
I may not be <br>
aware of some implications. Please let me know your
thoughts about this <br>
idea. And tell me if I missunderstood anything related
that led to the <br>
decision for the actual design.<br>
<br>
Unrelated question:<br>
I don't understand the necessity of the
__kmpc_fork_teams() run-time <br>
call as the __tgt_target_teams() implementation should be
able to handle <br>
this case.<br>
<br>
<br>
Daniel<o:p></o:p></span></p>
</div>
</div>
<p>---------------------------------------------------------------------<br>
Intel Corporation (UK) Limited<br>
Registered No. 1134945 (England)<br>
Registered Office: Pipers Way, Swindon SN3 1RJ<br>
VAT No: 860 2173 47</p>
<p>This e-mail and any attachments may contain confidential
material for<br>
the sole use of the intended recipient(s). Any review or
distribution<br>
by others is strictly prohibited. If you are not the intended<br>
recipient, please contact the sender and delete all copies.</p>
</blockquote>
<br>
</body>
</html>