[Openmp-commits] [PATCH] D38185: Implementation of OMPT as specified in OpenMP 5.0 Preview 1
Hansang Bae via Phabricator via Openmp-commits
openmp-commits at lists.llvm.org
Thu Oct 19 06:36:35 PDT 2017
hbae added inline comments.
================
Comment at: runtime/src/kmp_tasking.cpp:1692
+// complete
+kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
+#if OMPT_SUPPORT && OMPT_OPTIONAL
----------------
protze.joachim wrote:
> Hahnfeld wrote:
> > Hahnfeld wrote:
> > > hbae wrote:
> > > > Hahnfeld wrote:
> > > > > hbae wrote:
> > > > > > I know this is ugly, but we couldn't find a better way to fix ~15% regression in 376.kdtree.
> > > > > Duplicating code increases the risk of inconsistencies. Can we use templates so that the compiler can do the work? Something like the following:
> > > > >
> > > > > ```lang=C++
> > > > > template <bool ompt>
> > > > > kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid) {
> > > > > ...
> > > > > }
> > > > >
> > > > > kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
> > > > > #if OMPT_SUPPORT && OMPT_OPTIONAL
> > > > > if (UNLIKELY(ompt_enabled.enabled)) {
> > > > > return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid);
> > > > > }
> > > > > #endif
> > > > > return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid);
> > > > > }
> > > > > ```
> > > > Yes, we can try this, but I suggest to do it in a separate patch since it requires performance validation.
> > > But this solution is a maintenance nightmare. If we need time to do performance tests, then the "unoptimized" version should go in at first. So only one function that has all the instrumentation in it
> > For reference: In some offline tests with Joachim, I was able to convince the compiler to generate the exact same code. Slightly modified sketch that allows inlining of version without OMPT:
> >
> > ```lang=C++
> > template <bool ompt>
> > static kmp_int32 __kmpc_omp_taskwait_temp(ident_t *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
> > ...
> > }
> >
> > OMPT_NOINLINE
> > static kmp_int32 __kmpc_omp_taskwait_ompt(ident *loc_ref, kmp_int32 gtid, void *frame_address, void *return_address) {
> > return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, frame_address, return_address);
> > }
> >
> > kmp_int32 __kmpc_omp_taskwait(ident_t *loc_ref, kmp_int32 gtid) {
> > #if OMPT_SUPPORT && OMPT_OPTIONAL
> > if (UNLIKELY(ompt_enabled.enabled)) {
> > return __kmpc_omp_taskwait_temp<true>(loc_ref, gtid, OMPT_GET_FRAME_ADDRESS(1), OMPT_GET_RETURN_ADDRESS(0));
> > }
> > #endif
> > return __kmpc_omp_taskwait_temp<false>(loc_ref, gtid, NULL, NULL);
> > }
> > ```
> >
> > Clang 5.0.0 even optimizes out the two NULL parameters.
> I agree with Jonas, that this code duplication is a maintenance nightmare. Looking at the object dump, the latest version looks very promising.
>
> We will push an implementation of this modification (also applied for the task_if0 functions) to a new branch of the OMPT github repository later this week. This way you will have the chance to run performance tests.
Sounds good to me.
https://reviews.llvm.org/D38185
More information about the Openmp-commits
mailing list