[Openmp-dev] Proposal: Resolve combined directives in parsing phase
Schürmann, Daniel via Openmp-dev
openmp-dev at lists.llvm.org
Fri Jun 2 13:58:02 PDT 2017
Thank you for your feedback, Arpith.
I see that the nvptx codegen relies in parts on combined directives and understand the possible difficulties.
However, codegen is already able to do this with the target and teams directive. Remember that the split directives are immediate child nodes in the AST.
But you are right, this change can only be made after changes to codegen.
In my opinion the code generation should in general be able to handle both cases equivally, combined directives and the separated version.
The reason is that the user would expect the same outcome/performance from both, e.g.
-> #pragma omp target parallel for
as well as
-> #pragma omp target
#pragma omp parallel
#pragma omp for
like the spec states. (The only exception seem to be the distribute- combined directives.
Von: Arpith C Jacob<mailto:acjacob at us.ibm.com>
Gesendet: Freitag, 2. Juni 2017 20:24
An: Schürmann, Daniel<mailto:daniel.schuermann at campus.tu-berlin.de>
Cc: openmp-dev at lists.llvm.org<mailto:openmp-dev at lists.llvm.org>
Betreff: Re: [Openmp-dev] Proposal: Resolve combined directives in parsing phase
> Cons: - Code optimizations for combined directives may be harder to implement
>From the perspective of GPU code generation treating combined directives as a special case is *immensely* important for performance. Knowing that there is *no* serial section in the target region makes it possible to generate simplified and low overhead code that resembles typical CUDA kernels. I am worried that splitting combined directives will make it much harder to get this performance back, particularly because we must do OpenMP lowering in Clang.
> the code generation is implemented as inlined function what results in
> ignoring the semantic meaning of these directives.
Can the current code generation be altered to fix the issues that you see?
> Unrelated question:
> I don't understand the necessity of the __kmpc_fork_teams() run-time
> call as the __tgt_target_teams() implementation should be able to handle
> this case.
My understanding is that the __tgt* calls are implemented in the target offload library while the __kmpc* calls are in the OpenMP runtime. On CPUs, forking of teams is done in the __kmpc_fork_teams() call. On the GPU, the offload call __tgt_target_teams() launches a kernel with multiple teams so the __kmpc_fork_teams() is a no-op.
[Inactive hide details for Daniel Schürmann via Openmp-dev ---06/02/2017 09:08:11 AM---At the moment, combined directives have t]Daniel Schürmann via Openmp-dev ---06/02/2017 09:08:11 AM---At the moment, combined directives have their own ast representation for type-checking and code gen
From: Daniel Schürmann via Openmp-dev <openmp-dev at lists.llvm.org>
To: <openmp-dev at lists.llvm.org>
Date: 06/02/2017 09:08 AM
Subject: [Openmp-dev] Proposal: Resolve combined directives in parsing phase
Sent by: "Openmp-dev" <openmp-dev-bounces at lists.llvm.org>
At the moment, combined directives have their own ast representation for
type-checking and code generation. For some of the combined constructs,
the code generation is implemented as inlined function what results in
ignoring the semantic meaning of these directives.
This is true for e.g.
One solution would be the proper codegen implementation for these
However, I would like to propose a simpler and closer-to-spec approach:
By resolving combined directives in the parsing phase into nested AST nodes.
E.g. an OMPTargetTeamsDistributeDirective would be resolved into
whereas type-checking and codegen for these single directives is already
The advantages are:
- Much simpler type-checking and code generation
- We match the specification stating that combined directives have the
semantic meaning of one construct immediately followed by the other
- All combined directives are fully supported if their derived
constructs are supported
- The AST representation differs from the input. However, this is
already the case due to inserted implicit parameters.
- Code optimizations for combined directives may be harder to implement
In my opinion the benefits outweigh the disadvantages, but I may not be
aware of some implications. Please let me know your thoughts about this
idea. And tell me if I missunderstood anything related that led to the
decision for the actual design.
I don't understand the necessity of the __kmpc_fork_teams() run-time
call as the __tgt_target_teams() implementation should be able to handle
Openmp-dev mailing list
Openmp-dev at lists.llvm.org
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 105 bytes
More information about the Openmp-dev