[Openmp-commits] [PATCH] D38185: Implementation of OMPT as specified in OpenMP 5.0 Preview 1

Joachim Protze via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Thu Oct 19 02:08:32 PDT 2017


protze.joachim added inline comments.


================
Comment at: runtime/src/kmp_tasking.cpp:1692
+// complete
+kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
+#if OMPT_SUPPORT && OMPT_OPTIONAL
----------------
Hahnfeld wrote:
> Hahnfeld wrote:
> > hbae wrote:
> > > Hahnfeld wrote:
> > > > hbae wrote:
> > > > > I know this is ugly, but we couldn't find a better way to fix ~15% regression in 376.kdtree.
> > > > Duplicating code increases the risk of inconsistencies. Can we use templates so that the compiler can do the work? Something like the following:
> > > > 
> > > > ```lang=C++
> > > > template <bool ompt>
> > > > kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid) {
> > > >   ...
> > > > }
> > > > 
> > > > kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
> > > > #if OMPT_SUPPORT && OMPT_OPTIONAL
> > > >   if (UNLIKELY(ompt_enabled.enabled)) {
> > > >     return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid);
> > > >   }
> > > > #endif
> > > >   return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid);
> > > > }
> > > > ```
> > > Yes, we can try this, but I suggest to do it in a separate patch since it requires performance validation.
> > But this solution is a maintenance nightmare. If we need time to do performance tests, then the "unoptimized" version should go in at first. So only one function that has all the instrumentation in it
> For reference: In some offline tests with Joachim, I was able to convince the compiler to generate the exact same code. Slightly modified sketch that allows inlining of version without OMPT:
> 
> ```lang=C++
> template <bool ompt>
> static kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
>   ...
> }
> 
> OMPT_NOINLINE
> static kmp_int32 __kmpc_omp_taskwait_ompt(ident *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
>   return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, frame_address, return_address);
> }
> 
> kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
> #if OMPT_SUPPORT && OMPT_OPTIONAL
>   if (UNLIKELY(ompt_enabled.enabled)) {
>     return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, OMPT_GET_FRAME_ADDRESS(1), OMPT_GET_RETURN_ADDRESS(0));
>   }
> #endif
>   return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid, NULL, NULL);
> }
> ```
> 
> Clang 5.0.0 even optimizes out the two NULL parameters.
I agree with Jonas, that this code duplication is a maintenance nightmare. Looking at the object dump, the latest version looks very promising.

We will push an implementation of this modification (also applied for the task_if0 functions) to a new branch of the OMPT github repository later this week. This way you will have the chance to run performance tests.


https://reviews.llvm.org/D38185





More information about the Openmp-commits mailing list