[Openmp-dev] Proposal: Resolve combined directives in parsing phase

Fri Jun 9 05:45:51 PDT 2017

Jim:

> Thus it can easily be the case that omp parallel do/for is faster than omp parallel + omp do/for.

This is another good motivation for this proposal as I think, it is but should not be the case.

Btw, thank you for this very good example and provided solution. Question is, if we can resolve all combined constructs that easily.

I'm not quite sure what you're saying here; are you saying that there should be an unnecessary barrier in the omp parallel do/for ?
If so I disagree.
Or are you saying that the compiler should optimise omp parallel; {omp do/for} to remove the unnecessary barrier?
In which case I agree.

Like many standards, OpenMP is all predicated by "as if", so the standard lays down the user-visible behaviour, and any implementation which provides that is fine. The unnecessary barriers implied by the simple transformation of omp parallel do/for => omp parallel; {omp do/for} are not user visible and can be removed by the implementation.

You may choose to note, in particular, that there is language in TR4 that makes it clear that the OMPT profiling interface cannot be used to check whether this unnecessary barrier is present. In other words optimizations that are not visible to user-code are not outlawed because you can see them by using the OMPT profiling interfaces.

-- Jim

Jim Cownie <james.h.cownie at intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers, and Runtimes)
Tel: +44 117 9071438

From: Openmp-dev [mailto:openmp-dev-bounces at lists.llvm.org] On Behalf Of Schürmann, Daniel via Openmp-dev
Sent: Monday, June 5, 2017 5:20 PM
To: openmp-dev at lists.llvm.org
Subject: Re: [Openmp-dev] Proposal: Resolve combined directives in parsing phase

Thank you all for your feedback and suggestions!

I would like to update my proposal while taking your considerations into account.

Also, I hope it is okay to answer in one mail instead of spread out discussions.

Briefly again the motivation:

- some combined constructs are unhandled in the code generation.

- codegen is very cumbersome to match all directive combinations.

- combined constructs and separate nested constructs have potentially different performance characteristics.

Section 2.11 of the specification about Combined Constructs states:

The semantics of the combined constructs are identical to that of explicitly specifying

the first construct containing one instance of the second construct and no other statements.

To match this semantic rule, the idea is to expand these combined constructs already in the AST construction. This enables unimplemented combined constructs to use the already implemented code generation. Simultaneously, it provides same performance for combined constructs as separate ones.

After reconsidering some implications, it seems easier to leave parsing and type-checking as is and do the expansion in the AST construction (Sema::ActOnOpenMPxyzDirective()).

This way, the AST should look exactly the same whether the code contains combined constructs or not. The issue of performance regressions due to losing information about the close nesting should be solvable by flags in cases where this is really necessary. On the upside, it should be possible to derive the close nesting information if the constructs are previously not combined.

Now, I would like to reply to some of the points raised:

C Bergström:

> I'm not sure the error handling on a parsing issue would cascade like you expect.

This updated proposal is taking this into account by delaying the expansion to the AST construction.

Alexey Bataev:

> Also, you will need to properly capture arguments of some of the clauses that are used in inner OpenMP constructs.

Although I was more concerned about clauses related to the outer constructs, this is the main reason to better not do the expansion in the parsing phase. In Sema, all clauses are parsed and available. The clauses can be added to either both constructs or have to be splitted. I'm not sure if 'wrong' clauses would do any harm later (e.g. a num_teams clause added to a target construct).

Jim:

> Thus it can easily be the case that omp parallel do/for is faster than omp parallel + omp do/for.

This is another good motivation for this proposal as I think, it is but should not be the case.

Btw, thank you for this very good example and provided solution. Question is, if we can resolve all combined constructs that easily.

Arpith:

> The spec guarantees that there can be no user code between the target and the teams directive.  This is not the case with the other combined directives.

I was a little bit unspecific in my response. I meant that a close nesting, if present, can also be derived. Might be that this is easier for target teams combination, but we already use the nesting information for typechecking.

I know I'm proposing a not-so-small rework, but I think the benefit could be a cleaner implementation of the spec. As it is no urgent request, we could also slowly work in this direction, e.g. starting only with combined directives which remain working the same or are broken anyway.

Thanks again for taking the time!

Best regards,

Daniel

Von: Daniel Schürmann<mailto:daniel.schuermann at campus.tu-berlin.de>
Gesendet: Freitag, 2. Juni 2017 15:06
An: openmp-dev at lists.llvm.org<mailto:openmp-dev at lists.llvm.org>
Betreff: Proposal: Resolve combined directives in parsing phase

At the moment, combined directives have their own ast representation for
type-checking and code generation. For some of the combined constructs,
the code generation is implemented as inlined function what results in
ignoring the semantic meaning of these directives.

This is true for e.g.
EmitOMPTargetParallelForSimdDirective
EmitOMPTargetSimdDirective
EmitOMPTeamsDistributeDirective
EmitOMPTargetTeamsDistributeDirective
EmitOMPTargetTeamsDistributeParallelForDirective
and more

One solution would be the proper codegen implementation for these
directives.
However, I would like to propose a simpler and closer-to-spec approach:
By resolving combined directives in the parsing phase into nested AST nodes.

E.g. an OMPTargetTeamsDistributeDirective would be resolved into
OMPTargetDirective
     |- OMPTeamsDirective
         |- OMPDistributeDirective

whereas type-checking and codegen for these single directives is already
implemented.
The advantages are:
- Much simpler type-checking and code generation
- We match the specification stating that combined directives have the
semantic meaning of one construct immediately followed by the other
construct
- All combined directives are fully supported if their derived
constructs are supported

Potential disadvantages:
- The AST representation differs from the input. However, this is
already the case due to inserted implicit parameters.
- Code optimizations for combined directives may be harder to implement

In my opinion the benefits outweigh the disadvantages, but I may not be
aware of some implications. Please let me know your thoughts about this
idea. And tell me if I missunderstood anything related that led to the
decision for the actual design.

Unrelated question:
I don't understand the necessity of the __kmpc_fork_teams() run-time
call as the __tgt_target_teams() implementation should be able to handle
this case.

Daniel
---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/openmp-dev/attachments/20170609/18fb85d3/attachment.html>