[Openmp-commits] [PATCH] D122646: [OpenMP][RFC] libomp: Introduce hardware assisted barrier support for A64FX

Misono Tomohiro via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Tue Mar 29 05:49:13 PDT 2022


t-msn created this revision.
Herald added subscribers: guansong, yaxunl.
Herald added a project: All.
t-msn edited the summary of this revision.
t-msn edited the summary of this revision.
t-msn added a comment.
t-msn added a reviewer: AndreyChurbanov.
t-msn added a project: OpenMP.
t-msn published this revision for review.
Herald added a reviewer: jdoerfert.
Herald added subscribers: openmp-commits, sstefan1.

For reference, this is the result of EPCC openMP micro benchmark (syncbench'overhead [us]) on A64FX (Linux: 5.17).
(OMP_PLACES=threads OMP_PROC_BIND=close KMP_TASKING=0, OMP_NUM_THREADS is 12 or 48)

|              | 12thr |      | 48thr |      |
|              | hyper | hard | hyper | hard |
| PARALLEL     | 3.65  | 1.75 | 6.27  | 3.44 |
| FOR          | 2.03  | 0.28 | 4.85  | 1.94 |
| PARALLEL FOR | 3.81  | 1.81 | 6.36  | 3.50 |
| BARRIER      | 1.96  | 0.23 | 4.86  | 1.90 |
| SINGLE       | 1.93  | 0.77 | 4.75  | 2.30 |
| CRITICAL     | 0.49  | 0.48 | 0.99  | 0.96 |
| LOCK/UNLOCK  | 0.54  | 0.54 | 1.03  | 1.03 |
| ATOMIC       | 0.69  | 0.68 | 2.36  | 2.37 |
| REDUCTION    | 6.03  | 2.73 | 11.77 | 6.75 |


Hello,
This is a RFC version. There are some rough parts and some case are not optimized,
but I'd like to hear if current approach is a right way to suport hardware specific barrier.

Some descriptions are as below:

- Add new barrier type 'hard' which performs hardware assisted barrier. Currently this only works for A64FX processor on Linux.
  - To use hard barrier, all barrier patterns must use 'hard' i.e. KMP_FORKJOIN_BARRIER_PATTERN=hard,hard KMP_PLAIN_BARRIER_PATTERN=hard,hard KMP_REDUCTION_BARRIER_PATTERN=hard,hard
  - To use hard barrier, hardware barrier driver needs to be loaded in the system. User interface to driver is NOT stable at this point. Current driver is: https://github.com/t-msn/llvm-project/tree/hardbarrier-20220329
    - Current driver uses sysfs to setup hardware barrier which is opened in libomp code directly. I adopt this to avoid library dependency in libomp but  now I feel it is better to offload user-kernel interaction details to runtime library and libomp just dlopens the library.

- No restrictions in openmp syntax but it is required each thread runs in each core in succession. Basically this means "OMP_PLACES=threads OMP_PROC_BIND=close" is used or affinity is set in this way in parallell clause

- Due to hardware restriction, hardware barrier only synchornizes within a group (L2-share domain). When team's threads cross group boundry, hybrid barrier scheme is deployed. That is barrier has hierarchical structure and software barrier is used for inter-group barrier and hardware barrier for intra-group barrier

- Even when a system supports hardware barrier, whether hard barrier can be used or not is determined per team. If the team cannot use hard barrier for some reason, software barrier is used for the team

- If there is no task in application code and only intra-group barrier is used, KMP_TASKING=0 can be used to speedup barrier operation

- As implementation details, thread handling is basically the same as distribution barrier as both barrier requires reconfiguration when number of threads to be used is changed

- I tested almost all unit tests passes with hard barrier by default

(This means: `env LIT_OPTS="--show-unsupported --show-xfail" LIBOMP_TEST_ENV="KMP_PLAIN_BARRIER_PATTERN=hard,hard KMP_FORKJOIN_BARRIER_PATTERN=hard,hard KMP_REDUCTION_BARRIER_PATTERN=hard,hard"  ninja check-openmp`
with patch: https://reviews.llvm.org/D122645)

- A64FX's hardware barrier is described in the following manual: https://github.com/fujitsu/A64FX/blob/master/doc/A64FX_Specification_HPC_Extension_v1_EN.pdf


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D122646

Files:
  openmp/runtime/src/i18n/en_US.txt
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_affinity.cpp
  openmp/runtime/src/kmp_affinity.h
  openmp/runtime/src/kmp_barrier.cpp
  openmp/runtime/src/kmp_barrier.h
  openmp/runtime/src/kmp_global.cpp
  openmp/runtime/src/kmp_runtime.cpp
  openmp/runtime/src/kmp_settings.cpp
  openmp/runtime/src/kmp_stats.h
  openmp/runtime/src/kmp_tasking.cpp
  openmp/runtime/src/z_Linux_util.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D122646.418841.patch
Type: text/x-patch
Size: 72896 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20220329/26afa876/attachment-0001.bin>


More information about the Openmp-commits mailing list