<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body alink="#EE0000" bgcolor="#ffffff" link="#0B6CDA" text="#000000"
    vlink="#551A8B">
    <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">I’m
        not quite sure what you’re saying here; are you saying that
        there should be an unnecessary barrier in the
      </span><span style="font-size:10.0pt;font-family:"Lucida
        Console";color:#1F497D;mso-fareast-language:EN-US">omp
        parallel do/for
      </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">?<o:p></o:p></span></p>
    <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">If
        so I disagree.<o:p></o:p></span></p>
    <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Or
        are you saying that the compiler should optimise
      </span><span style="font-size:10.0pt;font-family:"Lucida
        Console";color:#1F497D;mso-fareast-language:EN-US">omp
        parallel; {omp do/for}</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">
        to remove the unnecessary barrier?<o:p></o:p></span></p>
    <span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">In
      which case I agree.<br>
      <br>
      <font color="#000000">I meant the latter. Maybe "semantic meaning"
        isn't the right term as optimizations preserve semantics anyway.
        However, if we merge these two cases in the AST construction and
        optimize away the unnecessary barrier, we gain easier Codegen
        with same performance for both cases:<br>
        - decompose "omp parallel for" into "omp parallel; {omp do/for}"<br>
        - check for closely nested omp for (we could also do this more
        generic I think)<br>
        - in Codegen add barrier only if no omp for is closely nested<br>
        <br>
        I think this could be applicable to more if not all combined
        directives.<br>
        <br>
        Kind regards<br>
        Daniel<br>
      </font><br>
      <br>
    </span>
    <div class="moz-cite-prefix">On 06/09/2017 02:45 PM, Cownie, James H
      wrote:<br>
    </div>
    <blockquote
cite="mid:397D95928DECEF49983F5B237627E9788AD38C7F@IRSMSX154.ger.corp.intel.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=windows-1252">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:"Lucida Console";
        panose-1:2 11 6 9 4 5 4 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.emailquote, li.emailquote, div.emailquote
        {mso-style-name:emailquote;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:1.0pt;
        border:none;
        padding:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
        {mso-style-name:x_msonormal;
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.xmsohyperlink
        {mso-style-name:x_msohyperlink;
        color:blue;
        text-decoration:underline;}
span.xmsohyperlinkfollowed
        {mso-style-name:x_msohyperlinkfollowed;
        color:#954F72;
        text-decoration:underline;}
span.EmailStyle21
        {mso-style-type:personal-reply;
        font-family:"Verdana",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="xmsonormal"><span lang="DE">Jim:<o:p></o:p></span></p>
        <p class="xmsonormal"><span lang="DE">> <i>Thus it can
              easily be the case that omp parallel do/for is faster than
              omp parallel + omp do/for.</i><o:p></o:p></span></p>
        <p class="xmsonormal"><span lang="DE">This is another good
            motivation for this proposal as I think, it is but should
            not be the case.<o:p></o:p></span></p>
        <p class="xmsonormal"><span lang="DE">Btw, thank you for this
            very good example and provided solution. Question is, if we
            can resolve all combined constructs that easily.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">I’m
            not quite sure what you’re saying here; are you saying that
            there should be an unnecessary barrier in the
          </span><span style="font-size:10.0pt;font-family:"Lucida
            Console";color:#1F497D;mso-fareast-language:EN-US">omp
            parallel do/for
          </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">?<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">If
            so I disagree.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Or
            are you saying that the compiler should optimise
          </span><span style="font-size:10.0pt;font-family:"Lucida
            Console";color:#1F497D;mso-fareast-language:EN-US">omp
            parallel; {omp do/for}</span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">
            to remove the unnecessary barrier?<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">In
            which case I agree.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">Like
            many standards, OpenMP is all predicated by “as if”, so the
            standard lays down the user-visible behaviour, and any
            implementation which provides that is fine. The unnecessary
            barriers implied by the simple transformation of</span><span
            style="font-size:10.0pt;font-family:"Lucida
            Console";color:#1F497D;mso-fareast-language:EN-US"> omp
            parallel do/for => omp parallel; {omp do/for}
          </span><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">are
            not user visible and can be removed by the implementation.</span><span
            style="font-size:10.0pt;font-family:"Lucida
            Console";color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US">You
            may choose to note, in particular, that there is language in
            TR4 that makes it clear that the OMPT profiling interface
            cannot be used to check whether this unnecessary barrier is
            present. In other words optimizations that are not visible
            to user-code are not outlawed because you can see them by
            using the OMPT profiling interfaces.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <div>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">--
              Jim<br>
              <br>
              Jim Cownie <a class="moz-txt-link-rfc2396E" href="mailto:james.h.cownie@intel.com"><james.h.cownie@intel.com></a><br>
              SSG/DPD/TCAR (Technical Computing, Analyzers, and
              Runtimes)<o:p></o:p></span></p>
          <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D">Tel:
              +44 117 9071438</span><span
style="font-size:11.0pt;font-family:"Calibri",sans-serif;color:#1F497D"><o:p></o:p></span></p>
        </div>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Verdana",sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <div>
          <div style="border:none;border-top:solid #E1E1E1
            1.0pt;padding:3.0pt 0cm 0cm 0cm">
            <p class="MsoNormal"><b><span
                  style="font-size:11.0pt;font-family:"Calibri",sans-serif"
                  lang="EN-US">From:</span></b><span
                style="font-size:11.0pt;font-family:"Calibri",sans-serif"
                lang="EN-US"> Openmp-dev
                [<a class="moz-txt-link-freetext" href="mailto:openmp-dev-bounces@lists.llvm.org">mailto:openmp-dev-bounces@lists.llvm.org</a>]
                <b>On Behalf Of </b>Schürmann, Daniel via Openmp-dev<br>
                <b>Sent:</b> Monday, June 5, 2017 5:20 PM<br>
                <b>To:</b> <a class="moz-txt-link-abbreviated" href="mailto:openmp-dev@lists.llvm.org">openmp-dev@lists.llvm.org</a><br>
                <b>Subject:</b> Re: [Openmp-dev] Proposal: Resolve
                combined directives in parsing phase<o:p></o:p></span></p>
          </div>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div>
          <div>
            <p class="xmsonormal"><span lang="DE">Thank you all for your
                feedback and suggestions!<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">I would like to update
                my proposal while taking your considerations into
                account.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Also, I hope it is
                okay to answer in one mail instead of spread out
                discussions.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Briefly again the
                motivation:<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">- some combined
                constructs are unhandled in the code generation.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">- codegen is very
                cumbersome to match all directive combinations.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">- combined constructs
                and separate nested constructs have potentially
                different performance characteristics.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Section 2.11 of the
                specification about Combined Constructs states:<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">The semantics of the
                combined constructs are identical to that of explicitly
                specifying
                <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">the first construct
                containing one instance of the second construct and no
                other statements.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">To match this semantic
                rule, the idea is to expand these combined constructs
                already in the AST construction. This enables
                unimplemented combined constructs to use the already
                implemented code generation. Simultaneously, it provides
                same performance for combined constructs as separate
                ones.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">After reconsidering
                some implications, it seems easier to leave parsing and
                type-checking as is and do the expansion in the AST
                construction (Sema::ActOnOpenMPxyzDirective()).<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">This way, the AST
                should look exactly the same whether the code contains
                combined constructs or not. The issue of performance
                regressions due to losing information about the close
                nesting should be solvable by flags in cases where this
                is really necessary. On the upside, it should be
                possible to derive the close nesting information if the
                constructs are previously not combined.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Now, I would like to
                reply to some of the points raised:<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">C Bergström:<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">> <i>I'm not sure
                  the error handling on a parsing issue would cascade
                  like you expect.
                </i><o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">This updated proposal
                is taking this into account by delaying the expansion to
                the AST construction.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Alexey Bataev:<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">> <i>Also, you
                  will need to properly capture arguments of some of the
                  clauses that are used in inner OpenMP constructs.</i><o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Although I was more
                concerned about clauses related to the outer constructs,
                this is the main reason to better not do the expansion
                in the parsing phase. In Sema, all clauses are parsed
                and available. The clauses can be added to either both
                constructs or have to be splitted. I'm not sure if
                'wrong' clauses would do any harm later (e.g. a
                num_teams clause added to a target construct).<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Jim:<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">> <i>Thus it can
                  easily be the case that omp parallel do/for is faster
                  than omp parallel + omp do/for.</i><o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">This is another good
                motivation for this proposal as I think, it is but
                should not be the case.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Btw, thank you for
                this very good example and provided solution. Question
                is, if we can resolve all combined constructs that
                easily.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Arpith:<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">> <i>The spec
                  guarantees that there can be no user code between the
                  target and the teams directive.  This is not the case
                  with the other combined directives.</i><o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">I was a little bit
                unspecific in my response. I meant that a close nesting,
                if present, can also be derived. Might be that this is
                easier for target teams combination, but we already use
                the nesting information for typechecking.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">I know I'm proposing a
                not-so-small rework, but I think the benefit could be a
                cleaner implementation of the spec. As it is no urgent
                request, we could also slowly work in this direction,
                e.g. starting only with combined directives which remain
                working the same or are broken anyway.<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Thanks again for
                taking the time!<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Best regards,<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE">Daniel<o:p></o:p></span></p>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
            <div style="border:none;border-top:solid #E1E1E1
              1.0pt;padding:3.0pt 0cm 0cm 0cm">
              <p class="xmsonormal"><b><span lang="DE">Von: </span></b><span
                  lang="DE"><a moz-do-not-send="true"
                    href="mailto:daniel.schuermann@campus.tu-berlin.de">Daniel
                    Schürmann</a><br>
                  <b>Gesendet: </b>Freitag, 2. Juni 2017 15:06<br>
                  <b>An: </b><a moz-do-not-send="true"
                    href="mailto:openmp-dev@lists.llvm.org">openmp-dev@lists.llvm.org</a><br>
                  <b>Betreff: </b>Proposal: Resolve combined directives
                  in parsing phase<o:p></o:p></span></p>
            </div>
            <p class="xmsonormal"><span lang="DE"> <o:p></o:p></span></p>
          </div>
        </div>
        <div>
          <p class="MsoNormal"><span style="font-size:10.0pt">At the
              moment, combined directives have their own ast
              representation for
              <br>
              type-checking and code generation. For some of the
              combined constructs, <br>
              the code generation is implemented as inlined function
              what results in <br>
              ignoring the semantic meaning of these directives.<br>
              <br>
              This is true for e.g.<br>
              EmitOMPTargetParallelForSimdDirective<br>
              EmitOMPTargetSimdDirective<br>
              EmitOMPTeamsDistributeDirective<br>
              EmitOMPTargetTeamsDistributeDirective<br>
              EmitOMPTargetTeamsDistributeParallelForDirective<br>
              and more<br>
              <br>
              One solution would be the proper codegen implementation
              for these <br>
              directives.<br>
              However, I would like to propose a simpler and
              closer-to-spec approach:<br>
              By resolving combined directives in the parsing phase into
              nested AST nodes.<br>
              <br>
              E.g. an OMPTargetTeamsDistributeDirective would be
              resolved into<br>
              OMPTargetDirective<br>
                   |- OMPTeamsDirective<br>
                       |- OMPDistributeDirective<br>
              <br>
              whereas type-checking and codegen for these single
              directives is already <br>
              implemented.<br>
              The advantages are:<br>
              - Much simpler type-checking and code generation<br>
              - We match the specification stating that combined
              directives have the <br>
              semantic meaning of one construct immediately followed by
              the other <br>
              construct<br>
              - All combined directives are fully supported if their
              derived <br>
              constructs are supported<br>
              <br>
              Potential disadvantages:<br>
              - The AST representation differs from the input. However,
              this is <br>
              already the case due to inserted implicit parameters.<br>
              - Code optimizations for combined directives may be harder
              to implement<br>
              <br>
              In my opinion the benefits outweigh the disadvantages, but
              I may not be <br>
              aware of some implications. Please let me know your
              thoughts about this <br>
              idea. And tell me if I missunderstood anything related
              that led to the <br>
              decision for the actual design.<br>
              <br>
              Unrelated question:<br>
              I don't understand the necessity of the
              __kmpc_fork_teams() run-time <br>
              call as the __tgt_target_teams() implementation should be
              able to handle <br>
              this case.<br>
              <br>
              <br>
              Daniel<o:p></o:p></span></p>
        </div>
      </div>
      <p>---------------------------------------------------------------------<br>
        Intel Corporation (UK) Limited<br>
        Registered No. 1134945 (England)<br>
        Registered Office: Pipers Way, Swindon SN3 1RJ<br>
        VAT No: 860 2173 47</p>
      <p>This e-mail and any attachments may contain confidential
        material for<br>
        the sole use of the intended recipient(s). Any review or
        distribution<br>
        by others is strictly prohibited. If you are not the intended<br>
        recipient, please contact the sender and delete all copies.</p>
    </blockquote>
    <br>
  </body>
</html>