[Openmp-commits] [PATCH] D64218: [OpenMP][NFCI] Cleanup the target synchronization implementation
Alexey Bataev via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Tue Aug 6 07:57:34 PDT 2019
ABataev added a comment.
In D64218#1616951 <https://reviews.llvm.org/D64218#1616951>, @jdoerfert wrote:
> @ABataev I am confused by the comments you make. Could you describe a situation in which we do not execute the same code before and after this patch? Also, the O0/03 inline & constant propagation comment is not clear to me, what is the issue there?
I was confused by the names of the template arguments.
================
Comment at: openmp/libomptarget/deviceRTLs/nvptx/src/target_impl.h:48
+/// need to implement it differently
+template <bool IsCancellable, bool IsSimple, bool IsSPMD>
+void __kmpc_impl_barrier(kmp_Ident *Loc, int32_t TID) {
----------------
JonChesterfield wrote:
> ABataev wrote:
> > jdoerfert wrote:
> > > ABataev wrote:
> > > > jdoerfert wrote:
> > > > > ABataev wrote:
> > > > > > I don't think this is correct. `IsSPMD` flag should be passed as a function parameter. Sometimes, we cannot define the execution mode at the compile time and we could define it only at the execution time (foe example, if the parallel region is called in the orphaned function, marked as noinline or compiled without optimizations, etc.)
> > > > > It is "correct" and it "works" with the rest of the code base but we can change it regardless:
> > > > >
> > > > > It works this way because we have explicit `__kmpc_barrier_XXXXX` functions for the SPMD and non-SPMD case. Through that level of abstraction we know the required barrier implemenetation at compile time. If we want to move avay from the different barrier types that have the mode baked into their name, we would need to make the template parameters arguments for sure.
> > > > >
> > > > > Long story short, I do not have strong feelings about this and it should not matter after inlining and constant propagation.
> > > > Actually, we use `__kmpc_barrier` in many cases. Even in SPMD mode. `__kmpc_barrier_xxxx` variants are used in very rare cases. And your change may lead to incorrect results in case of orphaned directives because you hardcoded `IsSpmd` to `false` in `__kmpc_barrier`. The fact that it works for you just means that you have very limited test set.
> > > > Inlining and constant propagation is not an option here. What if the user compiled the code at `O0`, without optimizations? Jus to debug the code? We should produce different results at `O0` and `O3`? Or explicitly marked the function as `noinline`?
> > > > Actually, we use __kmpc_barrier in many cases. Even in SPMD mode. __kmpc_barrier_xxxx variants are used in very rare cases. And your change may lead to incorrect results in case of orphaned directives because you hardcoded IsSpmd to false in __kmpc_barrier. The fact that it works for you just means that you have very limited test set.
> > >
> > > Please take a look at line 52. As before, a call to `__kmpc_barrier` will first check if we are in SPMD mode. (Even if the template argument is `false` that happens, if it is true it is not going to happen though). Thus, it is no different to the behavior we had.
> > >
> > > > Inlining and constant propagation is not an option here.
> > > I do not understand what you are taking about. This does not, as nothing ever can, rely on inlining and constant propagation.
> > >
> > > > What if the user compiled the code at O0, without optimizations? [...]
> > > They get the same semantics but slower.
> > >
> > > Please take a look at line 52. As before, a call to __kmpc_barrier will first check if we are in SPMD mode. (Even if the template argument is false that happens, if it is true it is not going to happen though). Thus, it is no different to the behavior we had.
> >
> > Then why do we need this template argument if it does nothing but just confuses people?
> > Inlining and constant propagation is not an option here. What if the user compiled the code at O0, without optimizations?
>
> Could you expand on your concern here? The code looks like it does the same thing at O0 and O3 to me (no calls to _builtin_constant_p) so O0 just means slower. That seems OK.
That was the wrong assumption on the meaning of the template arguments, I got the idea already but it is better to rename the arguments somehow because currently, they are confusing.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64218/new/
https://reviews.llvm.org/D64218
More information about the Openmp-commits
mailing list