[Openmp-commits] [PATCH] D95816: [OpenMP] libomp: minor changes to improve library performance

Andrey Churbanov via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Sun Feb 7 13:58:08 PST 2021

AndreyChurbanov added a comment.

In D95816#2547568 <https://reviews.llvm.org/D95816#2547568>, @jdoerfert wrote:

> Is it necessary to manually inline `__kmp_itt_taskwait_object` three times? The rest of the changes look good to me.

Not sure I got your comment right, will try to clarify a bit. 
I've just submitted update that removes "if (UNLIKELY(__itt_sync_create_ptr))" from __kmp_itt_taskwait_object() function. 
Previous change #3 of the patch was to copy the "if" from the function. Now, to make thing clearer, the change is to move the "if" from inside the function. 
It is not the inlining, it is just move of single "if" outside of the function.

So the scheme of the change #3 is to replace "call + if + return + if" with just "if" on the hot path of 99+% of execution (all those don't use an ittnotify tool).  This change gives performance gain in both cases of library build - with and without LTO.

I named change1 the changes with "UNLIKELY" hint (except the last three); change2 = change1 + debug assertions; change3 = change2 + move of "if (UNLIKELY(__itt_sync_create_ptr))" in the following data:
Median time in sec of 3 runs of SpecOMP 2012 376.kdtree benchmark on 2x Xeon Gold 6252 (48 cores, 48 threads used):
trunk - 401
change1 - 399
change2 - 397
change3 - 395
trunk-LTO - 385
change1-LTO - 381
change2-LTO - 373
change3-LTO - 371

Similar performance trend seen on other platforms I have for testing (tried various families of Intel Xeon).

So all three changes in this patch give extra performance for both builds - with and without LTO.

Eventually I think we should enable LTO build, as it gives extra performance with tiny binary size increase - less than 1%.  Link time increases, but still takes seconds.
I used trunk clang with `-DLIBOMP_CXXFLAGS=-flto -DLIBOMP_LDFLAGS="-flto -fuse-ld=lld"` cmake build options, and would prefer somebody more familiar with cmake do actual LTO enabling in cmake file(s).

  rG LLVM Github Monorepo



More information about the Openmp-commits mailing list