[Openmp-commits] [PATCH] D38185: Implementation of OMPT as specified in OpenMP 5.0 Preview 1

Jonas Hahnfeld via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Oct 18 18:02:37 PDT 2017


Hahnfeld added inline comments.


================
Comment at: runtime/src/kmp_tasking.cpp:1692
+// complete
+kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
+#if OMPT_SUPPORT && OMPT_OPTIONAL
----------------
Hahnfeld wrote:
> hbae wrote:
> > Hahnfeld wrote:
> > > hbae wrote:
> > > > I know this is ugly, but we couldn't find a better way to fix ~15% regression in 376.kdtree.
> > > Duplicating code increases the risk of inconsistencies. Can we use templates so that the compiler can do the work? Something like the following:
> > > 
> > > ```lang=C++
> > > template <bool ompt>
> > > kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid) {
> > >   ...
> > > }
> > > 
> > > kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
> > > #if OMPT_SUPPORT && OMPT_OPTIONAL
> > >   if (UNLIKELY(ompt_enabled.enabled)) {
> > >     return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid);
> > >   }
> > > #endif
> > >   return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid);
> > > }
> > > ```
> > Yes, we can try this, but I suggest to do it in a separate patch since it requires performance validation.
> But this solution is a maintenance nightmare. If we need time to do performance tests, then the "unoptimized" version should go in at first. So only one function that has all the instrumentation in it
For reference: In some offline tests with Joachim, I was able to convince the compiler to generate the exact same code. Slightly modified sketch that allows inlining of version without OMPT:

```lang=C++
template <bool ompt>
static kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
  ...
}

OMPT_NOINLINE
static kmp_int32 __kmpc_omp_taskwait_ompt(ident *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
  return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, frame_address, return_address);
}

kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
#if OMPT_SUPPORT && OMPT_OPTIONAL
  if (UNLIKELY(ompt_enabled.enabled)) {
    return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, OMPT_GET_FRAME_ADDRESS(1), OMPT_GET_RETURN_ADDRESS(0));
  }
#endif
  return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid, NULL, NULL);
}
```

Clang 5.0.0 even optimizes out the two NULL parameters.


https://reviews.llvm.org/D38185





More information about the Openmp-commits mailing list