[Openmp-dev] Proposal: Resolve combined directives in parsing phase

Daniel Schürmann via Openmp-dev openmp-dev at lists.llvm.org
Fri Jun 9 07:15:22 PDT 2017


I’m not quite sure what you’re saying here; are you saying that there 
should be an unnecessary barrier in the omp parallel do/for ?

If so I disagree.

Or are you saying that the compiler should optimise omp parallel; {omp 
do/for}to remove the unnecessary barrier?

In which case I agree.

I meant the latter. Maybe "semantic meaning" isn't the right term as 
optimizations preserve semantics anyway. However, if we merge these two 
cases in the AST construction and optimize away the unnecessary barrier, 
we gain easier Codegen with same performance for both cases:
- decompose "omp parallel for" into "omp parallel; {omp do/for}"
- check for closely nested omp for (we could also do this more generic I 
think)
- in Codegen add barrier only if no omp for is closely nested

I think this could be applicable to more if not all combined directives.

Kind regards
Daniel


On 06/09/2017 02:45 PM, Cownie, James H wrote:
>
> Jim:
>
> >/Thus it can easily be the case that omp parallel do/for is faster 
> than omp parallel + omp do/for./
>
> This is another good motivation for this proposal as I think, it is 
> but should not be the case.
>
> Btw, thank you for this very good example and provided solution. 
> Question is, if we can resolve all combined constructs that easily.
>
> I’m not quite sure what you’re saying here; are you saying that there 
> should be an unnecessary barrier in the omp parallel do/for ?
>
> If so I disagree.
>
> Or are you saying that the compiler should optimise omp parallel; {omp 
> do/for}to remove the unnecessary barrier?
>
> In which case I agree.
>
> Like many standards, OpenMP is all predicated by “as if”, so the 
> standard lays down the user-visible behaviour, and any implementation 
> which provides that is fine. The unnecessary barriers implied by the 
> simple transformation ofomp parallel do/for => omp parallel; {omp 
> do/for} are not user visible and can be removed by the implementation.
>
> You may choose to note, in particular, that there is language in TR4 
> that makes it clear that the OMPT profiling interface cannot be used 
> to check whether this unnecessary barrier is present. In other words 
> optimizations that are not visible to user-code are not outlawed 
> because you can see them by using the OMPT profiling interfaces.
>
> -- Jim
>
> Jim Cownie <james.h.cownie at intel.com>
> SSG/DPD/TCAR (Technical Computing, Analyzers, and Runtimes)
>
> Tel: +44 117 9071438
>
> *From:*Openmp-dev [mailto:openmp-dev-bounces at lists.llvm.org] *On 
> Behalf Of *Schürmann, Daniel via Openmp-dev
> *Sent:* Monday, June 5, 2017 5:20 PM
> *To:* openmp-dev at lists.llvm.org
> *Subject:* Re: [Openmp-dev] Proposal: Resolve combined directives in 
> parsing phase
>
> Thank you all for your feedback and suggestions!
>
> I would like to update my proposal while taking your considerations 
> into account.
>
> Also, I hope it is okay to answer in one mail instead of spread out 
> discussions.
>
> Briefly again the motivation:
>
> - some combined constructs are unhandled in the code generation.
>
> - codegen is very cumbersome to match all directive combinations.
>
> - combined constructs and separate nested constructs have potentially 
> different performance characteristics.
>
> Section 2.11 of the specification about Combined Constructs states:
>
> The semantics of the combined constructs are identical to that of 
> explicitly specifying
>
> the first construct containing one instance of the second construct 
> and no other statements.
>
> To match this semantic rule, the idea is to expand these combined 
> constructs already in the AST construction. This enables unimplemented 
> combined constructs to use the already implemented code generation. 
> Simultaneously, it provides same performance for combined constructs 
> as separate ones.
>
> After reconsidering some implications, it seems easier to leave 
> parsing and type-checking as is and do the expansion in the AST 
> construction (Sema::ActOnOpenMPxyzDirective()).
>
> This way, the AST should look exactly the same whether the code 
> contains combined constructs or not. The issue of performance 
> regressions due to losing information about the close nesting should 
> be solvable by flags in cases where this is really necessary. On the 
> upside, it should be possible to derive the close nesting information 
> if the constructs are previously not combined.
>
> Now, I would like to reply to some of the points raised:
>
> C Bergström:
>
> >/I'm not sure the error handling on a parsing issue would cascade like 
> you expect. /
>
> This updated proposal is taking this into account by delaying the 
> expansion to the AST construction.
>
> Alexey Bataev:
>
> >/Also, you will need to properly capture arguments of some of the 
> clauses that are used in inner OpenMP constructs./
>
> Although I was more concerned about clauses related to the outer 
> constructs, this is the main reason to better not do the expansion in 
> the parsing phase. In Sema, all clauses are parsed and available. The 
> clauses can be added to either both constructs or have to be splitted. 
> I'm not sure if 'wrong' clauses would do any harm later (e.g. a 
> num_teams clause added to a target construct).
>
> Jim:
>
> >/Thus it can easily be the case that omp parallel do/for is faster 
> than omp parallel + omp do/for./
>
> This is another good motivation for this proposal as I think, it is 
> but should not be the case.
>
> Btw, thank you for this very good example and provided solution. 
> Question is, if we can resolve all combined constructs that easily.
>
> Arpith:
>
> >/The spec guarantees that there can be no user code between the target 
> and the teams directive.  This is not the case with the other combined 
> directives./
>
> I was a little bit unspecific in my response. I meant that a close 
> nesting, if present, can also be derived. Might be that this is easier 
> for target teams combination, but we already use the nesting 
> information for typechecking.
>
> I know I'm proposing a not-so-small rework, but I think the benefit 
> could be a cleaner implementation of the spec. As it is no urgent 
> request, we could also slowly work in this direction, e.g. starting 
> only with combined directives which remain working the same or are 
> broken anyway.
>
> Thanks again for taking the time!
>
> Best regards,
>
> Daniel
>
> *Von: *Daniel Schürmann <mailto:daniel.schuermann at campus.tu-berlin.de>
> *Gesendet: *Freitag, 2. Juni 2017 15:06
> *An: *openmp-dev at lists.llvm.org <mailto:openmp-dev at lists.llvm.org>
> *Betreff: *Proposal: Resolve combined directives in parsing phase
>
> At the moment, combined directives have their own ast representation for
> type-checking and code generation. For some of the combined constructs,
> the code generation is implemented as inlined function what results in
> ignoring the semantic meaning of these directives.
>
> This is true for e.g.
> EmitOMPTargetParallelForSimdDirective
> EmitOMPTargetSimdDirective
> EmitOMPTeamsDistributeDirective
> EmitOMPTargetTeamsDistributeDirective
> EmitOMPTargetTeamsDistributeParallelForDirective
> and more
>
> One solution would be the proper codegen implementation for these
> directives.
> However, I would like to propose a simpler and closer-to-spec approach:
> By resolving combined directives in the parsing phase into nested AST 
> nodes.
>
> E.g. an OMPTargetTeamsDistributeDirective would be resolved into
> OMPTargetDirective
>      |- OMPTeamsDirective
>          |- OMPDistributeDirective
>
> whereas type-checking and codegen for these single directives is already
> implemented.
> The advantages are:
> - Much simpler type-checking and code generation
> - We match the specification stating that combined directives have the
> semantic meaning of one construct immediately followed by the other
> construct
> - All combined directives are fully supported if their derived
> constructs are supported
>
> Potential disadvantages:
> - The AST representation differs from the input. However, this is
> already the case due to inserted implicit parameters.
> - Code optimizations for combined directives may be harder to implement
>
> In my opinion the benefits outweigh the disadvantages, but I may not be
> aware of some implications. Please let me know your thoughts about this
> idea. And tell me if I missunderstood anything related that led to the
> decision for the actual design.
>
> Unrelated question:
> I don't understand the necessity of the __kmpc_fork_teams() run-time
> call as the __tgt_target_teams() implementation should be able to handle
> this case.
>
>
> Daniel
>
> ---------------------------------------------------------------------
> Intel Corporation (UK) Limited
> Registered No. 1134945 (England)
> Registered Office: Pipers Way, Swindon SN3 1RJ
> VAT No: 860 2173 47
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20170609/36fc3a52/attachment-0001.html>


More information about the Openmp-dev mailing list