[Openmp-commits] [PATCH] D38185: Implementation of OMPT as specified in OpenMP 5.0 Preview 1

Joachim Protze via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Mon Oct 23 05:20:37 PDT 2017


protze.joachim added inline comments.


================
Comment at: runtime/src/kmp_tasking.cpp:1692
+// complete
+kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
+#if OMPT_SUPPORT && OMPT_OPTIONAL
----------------
protze.joachim wrote:
> Hahnfeld wrote:
> > Hahnfeld wrote:
> > > hbae wrote:
> > > > Hahnfeld wrote:
> > > > > hbae wrote:
> > > > > > I know this is ugly, but we couldn't find a better way to fix ~15% regression in 376.kdtree.
> > > > > Duplicating code increases the risk of inconsistencies. Can we use templates so that the compiler can do the work? Something like the following:
> > > > > 
> > > > > ```lang=C++
> > > > > template <bool ompt>
> > > > > kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid) {
> > > > >   ...
> > > > > }
> > > > > 
> > > > > kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
> > > > > #if OMPT_SUPPORT && OMPT_OPTIONAL
> > > > >   if (UNLIKELY(ompt_enabled.enabled)) {
> > > > >     return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid);
> > > > >   }
> > > > > #endif
> > > > >   return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid);
> > > > > }
> > > > > ```
> > > > Yes, we can try this, but I suggest to do it in a separate patch since it requires performance validation.
> > > But this solution is a maintenance nightmare. If we need time to do performance tests, then the "unoptimized" version should go in at first. So only one function that has all the instrumentation in it
> > For reference: In some offline tests with Joachim, I was able to convince the compiler to generate the exact same code. Slightly modified sketch that allows inlining of version without OMPT:
> > 
> > ```lang=C++
> > template <bool ompt>
> > static kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
> >   ...
> > }
> > 
> > OMPT_NOINLINE
> > static kmp_int32 __kmpc_omp_taskwait_ompt(ident *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
> >   return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, frame_address, return_address);
> > }
> > 
> > kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
> > #if OMPT_SUPPORT && OMPT_OPTIONAL
> >   if (UNLIKELY(ompt_enabled.enabled)) {
> >     return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, OMPT_GET_FRAME_ADDRESS(1), OMPT_GET_RETURN_ADDRESS(0));
> >   }
> > #endif
> >   return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid, NULL, NULL);
> > }
> > ```
> > 
> > Clang 5.0.0 even optimizes out the two NULL parameters.
> I agree with Jonas, that this code duplication is a maintenance nightmare. Looking at the object dump, the latest version looks very promising.
> 
> We will push an implementation of this modification (also applied for the task_if0 functions) to a new branch of the OMPT github repository later this week. This way you will have the chance to run performance tests.
The modifications are available at:
https://github.com/OpenMPToolsInterface/LLVM-openmp/tree/template-versioning-inline

Comparison of objects dumps for current implementation and templated implementation shows no changes in instructions when built with clang compiler (ignoring changed addresses/names). Therefore I tend to update this patch.


https://reviews.llvm.org/D38185





More information about the Openmp-commits mailing list