[Openmp-commits] [PATCH] D100997: [OpenMP] Refactor/Rework topology discovery code

Jonathan Peyton via Phabricator via Openmp-commits openmp-commits at lists.llvm.org
Wed Apr 21 13:58:23 PDT 2021


jlpeyton created this revision.
jlpeyton added reviewers: AndreyChurbanov, tlwilmar, hbae, Nawrin.
jlpeyton added a project: OpenMP.
Herald added subscribers: guansong, yaxunl.
jlpeyton requested review of this revision.
Herald added a reviewer: jdoerfert.
Herald added a subscriber: sstefan1.

This patch does the following:

1. Introduces `kmp_topology_t` as the runtime-friendly structure (the corresponding global variable is `__kmp_topology`) to determine the exact topology structure which can vary widely among current and future machine architectures. For example, an SNC-4 enabled KNL machine can have 5 topology layers with Hwloc. The current design is not easy to expand beyond the assumed three layer topology: socket, core, and thread so a rework capable of using the existing `KMP_AFFINITY` mechanisms was required.

This new topology structure has:

- The depth and types of the topology
- Ratio count for each consecutive level (e.g., number of cores per socket, number of threads per core)
- Absolute count for each level (e.g., 2 sockets, 16 cores, 32 threads)
- Equivalent topology layer map (e.g., Numa domain is equivalent to socket, L1 <https://reviews.llvm.org/L1>/L2 cache equivalent to core)
- Whether it is uniform or not

The hardware threads are represented with the `kmp_hw_thread_t` structure. This structure contains the ids (e.g., socket 0, core 1, thread 0) and other information grabbed from the previous `Address` structure. The `kmp_topology_t` structure contains an array of these.

2. Generalizes the `KMP_HW_SUBSET` envirable for the new `kmp_topology_t` structure. The algorithm doesn't assume any order with tiles/numa domains/sockets/cores/threads. Instead it just parses the envirable, makes sure it is consistent with the detected topology (including taking into account equivalent layers) and then trims away the unneeded subset of hardware threads. To enable this, a new `kmp_hw_subset_t` structure is introduced which contains a vector of items (hardware type, number user wants, offset). Any keyword within `__kmp_hw_get_keyword()` can be used as a name and can be shortened as well. e.g., `KMP_HW_SUBSET=1s,2numa,4tile,2c,3t`

3. Rework topology detection functions to be simpler and only do a singular task of detecting the topology. Printing, and all canonicalizing functionality is now done afterwards so many lines of duplicated code are reduced.

4. New TR8 `ll_caches` and `numa_domains` are added to `OMP_PLACES`, and consequently, `KMP_AFFINITY`'s granularity setting. In fact, all the names within `__kmp_hw_get_keyword()` are available for use in `OMP_PLACES` or `KMP_AFFINITY`'s granularity setting.

5. A lot of places where explicit listing of allowed names in affinity settings inside `if()` conditions is made more general so expanding the topology names is less burdensome in the future.

6. CPUID leaf 4 cache detection was added to existing x2apic id method so equivalent caches could be detected (in particular for the `ll_caches` place).


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D100997

Files:
  openmp/runtime/src/i18n/en_US.txt
  openmp/runtime/src/kmp.h
  openmp/runtime/src/kmp_affinity.cpp
  openmp/runtime/src/kmp_affinity.h
  openmp/runtime/src/kmp_global.cpp
  openmp/runtime/src/kmp_settings.cpp
  openmp/runtime/test/affinity/kmp-affinity.c
  openmp/runtime/test/affinity/kmp-hw-subset.c
  openmp/runtime/test/affinity/libomp_test_topology.h
  openmp/runtime/test/affinity/omp-places.c

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D100997.339358.patch
Type: text/x-patch
Size: 222608 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/openmp-commits/attachments/20210421/409f20a6/attachment-0001.bin>


More information about the Openmp-commits mailing list