[Openmp-commits] [openmp] Add openmp support to System z (PR #66081)

Neale Ferguson via Openmp-commits openmp-commits at lists.llvm.org
Wed Nov 1 04:46:57 PDT 2023


https://github.com/nealef updated https://github.com/llvm/llvm-project/pull/66081

>From 440c23e3d0ce5407a69b8de622e66ec4554129a6 Mon Sep 17 00:00:00 2001
From: Neale Ferguson <neale at sinenomine.net>
Date: Tue, 12 Sep 2023 08:37:07 -0400
Subject: [PATCH] Add openmp support to System z

* openmp/README.rst
  - Add s390x to those platforms supported

* openmp/libomptarget/plugins-nextgen/CMakeLists.txt
  - Add s390x subdirectory

* openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
  - Add s390x definitions

* openmp/runtime/CMakeLists.txt
  - Add s390x to those platforms supported

* openmp/runtime/cmake/LibompGetArchitecture.cmake
  - Define s390x ARCHITECTURE

* openmp/runtime/cmake/LibompMicroTests.cmake
  - Add dependencies for System z (aka s390x)

* openmp/runtime/cmake/LibompUtils.cmake
  - Add S390X to the mix

* openmp/runtime/cmake/config-ix.cmake
  - Add s390x as a supported LIPOMP_ARCH

* openmp/runtime/src/kmp_affinity.h
  - Define __NR_sched_[get|set]addinity for s390x

* openmp/runtime/src/kmp_config.h.cmake
  - Define CACHE_LINE for s390x

* openmp/runtime/src/kmp_os.h
  - Add KMP_ARCH_S390X to support checks

* openmp/runtime/src/kmp_platform.h
  - Define KMP_ARCH_S390X

* openmp/runtime/src/kmp_runtime.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/kmp_tasking.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
  - Define ITT_ARCH_S390X

* openmp/runtime/src/z_Linux_asm.S
  - Instantiate __kmp_invoke_microtask for s390x

* openmp/runtime/src/z_Linux_util.cpp
  - Generate code when KMP_ARCH_S390X is defined

* openmp/runtime/test/ompt/callback.h
  - Define print_possible_return_addresses for s390x

* openmp/runtime/tools/lib/Platform.pm
  - Return s390x as platform and host architecture

* openmp/runtime/tools/lib/Uname.pm
  - Set hardware platform value for s390x

* openmp/runtime/src/kmp_affinity.cpp
  - Implement s390x /proc/cpuinfo parsing

* openmp/runtime/src/kmp_tasking.cpp
  - Add backchain attribute to __km_invoke_task
  - Style fix

* openmp/runtime/src/z_Linux_asm.S
  - Add unwind information

* openmp/runtime/test/lit.cfg
  - Build openmp tests with -mbackchain

* openmp/runtime/test/ompt/callback.h
  - Additional possibility for print_possible_return_addresses()
---
 openmp/README.rst                             |   4 +-
 .../plugins-nextgen/CMakeLists.txt            |   1 +
 .../plugins-nextgen/s390x/CMakeLists.txt      |  17 ++
 openmp/runtime/CMakeLists.txt                 |   9 +-
 .../runtime/cmake/LibompGetArchitecture.cmake |   2 +
 openmp/runtime/cmake/LibompMicroTests.cmake   |   3 +
 openmp/runtime/cmake/LibompUtils.cmake        |   2 +
 openmp/runtime/cmake/config-ix.cmake          |   3 +-
 openmp/runtime/src/kmp_affinity.cpp           |  33 ++++
 openmp/runtime/src/kmp_affinity.h             |  13 +-
 openmp/runtime/src/kmp_config.h.cmake         |   2 +
 openmp/runtime/src/kmp_os.h                   |   6 +-
 openmp/runtime/src/kmp_platform.h             |   7 +-
 openmp/runtime/src/kmp_runtime.cpp            |   3 +-
 openmp/runtime/src/kmp_tasking.cpp            |  10 +-
 .../thirdparty/ittnotify/ittnotify_config.h   |   6 +
 openmp/runtime/src/z_Linux_asm.S              | 159 +++++++++++++++++-
 openmp/runtime/src/z_Linux_util.cpp           |   2 +-
 openmp/runtime/test/lit.cfg                   |   2 +
 openmp/runtime/test/ompt/callback.h           |  16 ++
 openmp/runtime/tools/lib/Platform.pm          |   6 +-
 openmp/runtime/tools/lib/Uname.pm             |   2 +
 22 files changed, 291 insertions(+), 17 deletions(-)
 create mode 100644 openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt

diff --git a/openmp/README.rst b/openmp/README.rst
index bb9443df56d7656..0e4916f44c68287 100644
--- a/openmp/README.rst
+++ b/openmp/README.rst
@@ -141,7 +141,7 @@ Options for all Libraries
 Options for ``libomp``
 ----------------------
 
-**LIBOMP_ARCH** = ``aarch64|arm|i386|loongarch64|mic|mips|mips64|ppc64|ppc64le|x86_64|riscv64``
+**LIBOMP_ARCH** = ``aarch64|arm|i386|loongarch64|mic|mips|mips64|ppc64|ppc64le|x86_64|riscv64|s390x``
   The default value for this option is chosen based on probing the compiler for
   architecture macros (e.g., is ``__x86_64__`` predefined by compiler?).
 
@@ -198,7 +198,7 @@ Optional Features
 **LIBOMP_OMPT_SUPPORT** = ``ON|OFF``
   Include support for the OpenMP Tools Interface (OMPT).
   This option is supported and ``ON`` by default for x86, x86_64, AArch64,
-  PPC64, RISCV64 and LoongArch64 on Linux* and macOS*.
+  PPC64, RISCV64, LoongArch64, and s390x on Linux* and macOS*.
   This option is ``OFF`` if this feature is not supported for the platform.
 
 **LIBOMP_OMPT_OPTIONAL** = ``ON|OFF``
diff --git a/openmp/libomptarget/plugins-nextgen/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/CMakeLists.txt
index 9b4f4a5866e7987..d81e5d37d7c08df 100644
--- a/openmp/libomptarget/plugins-nextgen/CMakeLists.txt
+++ b/openmp/libomptarget/plugins-nextgen/CMakeLists.txt
@@ -96,6 +96,7 @@ add_subdirectory(cuda)
 add_subdirectory(ppc64)
 add_subdirectory(ppc64le)
 add_subdirectory(x86_64)
+add_subdirectory(s390x)
 
 # Make sure the parent scope can see the plugins that will be created.
 set(LIBOMPTARGET_SYSTEM_TARGETS "${LIBOMPTARGET_SYSTEM_TARGETS}" PARENT_SCOPE)
diff --git a/openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt b/openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
new file mode 100644
index 000000000000000..1b12a292899980e
--- /dev/null
+++ b/openmp/libomptarget/plugins-nextgen/s390x/CMakeLists.txt
@@ -0,0 +1,17 @@
+##===----------------------------------------------------------------------===##
+#
+# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+# See https://llvm.org/LICENSE.txt for license information.
+# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+#
+##===----------------------------------------------------------------------===##
+#
+# Build a plugin for a s390x machine if available.
+#
+##===----------------------------------------------------------------------===##
+
+if(CMAKE_SYSTEM_NAME MATCHES "Linux")
+ build_generic_elf64("SystemZ" "S390X" "s390x" "s390x-ibm-linux-gnu" "22")
+else()
+ libomptarget_say("Not building s390x NextGen offloading plugin: machine not found in the system.")
+endif()
diff --git a/openmp/runtime/CMakeLists.txt b/openmp/runtime/CMakeLists.txt
index 4441c4babdc07c0..8a913989272c4c5 100644
--- a/openmp/runtime/CMakeLists.txt
+++ b/openmp/runtime/CMakeLists.txt
@@ -30,7 +30,7 @@ if(${OPENMP_STANDALONE_BUILD})
   # If adding a new architecture, take a look at cmake/LibompGetArchitecture.cmake
   libomp_get_architecture(LIBOMP_DETECTED_ARCH)
   set(LIBOMP_ARCH ${LIBOMP_DETECTED_ARCH} CACHE STRING
-    "The architecture to build for (x86_64/i386/arm/ppc64/ppc64le/aarch64/mic/mips/mips64/riscv64/loongarch64/ve).")
+    "The architecture to build for (x86_64/i386/arm/ppc64/ppc64le/aarch64/mic/mips/mips64/riscv64/loongarch64/ve/s390x).")
   # Should assertions be enabled?  They are on by default.
   set(LIBOMP_ENABLE_ASSERTIONS TRUE CACHE BOOL
     "enable assertions?")
@@ -65,6 +65,8 @@ else() # Part of LLVM build
     set(LIBOMP_ARCH loongarch64)
   elseif(LIBOMP_NATIVE_ARCH MATCHES "ve")
     set(LIBOMP_ARCH ve)
+  elseif(LIBOMP_NATIVE_ARCH MATCHES "s390x")
+    set(LIBOMP_ARCH s390x)
   else()
     # last ditch effort
     libomp_get_architecture(LIBOMP_ARCH)
@@ -85,7 +87,7 @@ if(LIBOMP_ARCH STREQUAL "aarch64")
   endif()
 endif()
 
-libomp_check_variable(LIBOMP_ARCH 32e x86_64 32 i386 arm ppc64 ppc64le aarch64 aarch64_a64fx mic mips mips64 riscv64 loongarch64 ve)
+libomp_check_variable(LIBOMP_ARCH 32e x86_64 32 i386 arm ppc64 ppc64le aarch64 aarch64_a64fx mic mips mips64 riscv64 loongarch64 ve s390x)
 
 set(LIBOMP_LIB_TYPE normal CACHE STRING
   "Performance,Profiling,Stubs library (normal/profile/stubs)")
@@ -165,6 +167,7 @@ set(MIPS FALSE)
 set(RISCV64 FALSE)
 set(LOONGARCH64 FALSE)
 set(VE FALSE)
+set(S390X FALSE)
 if("${LIBOMP_ARCH}" STREQUAL "i386" OR "${LIBOMP_ARCH}" STREQUAL "32")    # IA-32 architecture
   set(IA32 TRUE)
 elseif("${LIBOMP_ARCH}" STREQUAL "x86_64" OR "${LIBOMP_ARCH}" STREQUAL "32e") # Intel(R) 64 architecture
@@ -193,6 +196,8 @@ elseif("${LIBOMP_ARCH}" STREQUAL "loongarch64") # LoongArch64 architecture
     set(LOONGARCH64 TRUE)
 elseif("${LIBOMP_ARCH}" STREQUAL "ve") # VE architecture
     set(VE TRUE)
+elseif("${LIBOMP_ARCH}" STREQUAL "s390x") # S390x (Z) architecture
+    set(S390X TRUE)
 endif()
 
 # Set some flags based on build_type
diff --git a/openmp/runtime/cmake/LibompGetArchitecture.cmake b/openmp/runtime/cmake/LibompGetArchitecture.cmake
index 98bfce9ae990a7b..9d4f08b92de50d5 100644
--- a/openmp/runtime/cmake/LibompGetArchitecture.cmake
+++ b/openmp/runtime/cmake/LibompGetArchitecture.cmake
@@ -51,6 +51,8 @@ function(libomp_get_architecture return_arch)
       #error ARCHITECTURE=loongarch64
     #elif defined(__ve__)
       #error ARCHITECTURE=ve
+    #elif defined(__s390x__)
+      #error ARCHITECTURE=s390x
     #else
       #error ARCHITECTURE=UnknownArchitecture
     #endif
diff --git a/openmp/runtime/cmake/LibompMicroTests.cmake b/openmp/runtime/cmake/LibompMicroTests.cmake
index 88deb461dbaf3a2..e8cc218af0c294f 100644
--- a/openmp/runtime/cmake/LibompMicroTests.cmake
+++ b/openmp/runtime/cmake/LibompMicroTests.cmake
@@ -217,6 +217,9 @@ else()
     elseif(${LOONGARCH64})
       libomp_append(libomp_expected_library_deps libc.so.6)
       libomp_append(libomp_expected_library_deps ld.so.1)
+    elseif(${S390X})
+      libomp_append(libomp_expected_library_deps libc.so.6)
+      libomp_append(libomp_expected_library_deps ld.so.1)
     endif()
     libomp_append(libomp_expected_library_deps libpthread.so.0 IF_FALSE STUBS_LIBRARY)
     libomp_append(libomp_expected_library_deps libhwloc.so.5 LIBOMP_USE_HWLOC)
diff --git a/openmp/runtime/cmake/LibompUtils.cmake b/openmp/runtime/cmake/LibompUtils.cmake
index 0151ca0ea826bd7..139eabb45c54f74 100644
--- a/openmp/runtime/cmake/LibompUtils.cmake
+++ b/openmp/runtime/cmake/LibompUtils.cmake
@@ -113,6 +113,8 @@ function(libomp_get_legal_arch return_arch_string)
     set(${return_arch_string} "LOONGARCH64" PARENT_SCOPE)
   elseif(${VE})
     set(${return_arch_string} "VE" PARENT_SCOPE)
+  elseif(${S390X})
+    set(${return_arch_string} "S390X" PARENT_SCOPE)
   else()
     set(${return_arch_string} "${LIBOMP_ARCH}" PARENT_SCOPE)
     libomp_warning_say("libomp_get_legal_arch(): Warning: Unknown architecture: Using ${LIBOMP_ARCH}")
diff --git a/openmp/runtime/cmake/config-ix.cmake b/openmp/runtime/cmake/config-ix.cmake
index 9869aeab0354635..d54d350816926df 100644
--- a/openmp/runtime/cmake/config-ix.cmake
+++ b/openmp/runtime/cmake/config-ix.cmake
@@ -325,7 +325,8 @@ else()
       (LIBOMP_ARCH STREQUAL ppc64le) OR
       (LIBOMP_ARCH STREQUAL ppc64) OR
       (LIBOMP_ARCH STREQUAL riscv64) OR
-      (LIBOMP_ARCH STREQUAL loongarch64))
+      (LIBOMP_ARCH STREQUAL loongarch64) OR
+      (LIBOMP_ARCH STREQUAL s390x))
      AND # OS supported?
      ((WIN32 AND LIBOMP_HAVE_PSAPI) OR APPLE OR (NOT WIN32 AND LIBOMP_HAVE_WEAK_ATTRIBUTE)))
     set(LIBOMP_HAVE_OMPT_SUPPORT TRUE)
diff --git a/openmp/runtime/src/kmp_affinity.cpp b/openmp/runtime/src/kmp_affinity.cpp
index 20c1c610b9159e0..8c608d78bb56fe1 100644
--- a/openmp/runtime/src/kmp_affinity.cpp
+++ b/openmp/runtime/src/kmp_affinity.cpp
@@ -2990,6 +2990,9 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line,
 
   unsigned num_avail = 0;
   *line = 0;
+#if KMP_ARCH_S390X
+  bool reading_s390x_sys_info = true;
+#endif
   while (!feof(f)) {
     // Create an inner scoping level, so that all the goto targets at the end of
     // the loop appear in an outer scoping level. This avoids warnings about
@@ -3035,8 +3038,21 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line,
       if (*buf == '\n' && *line == 2)
         continue;
 #endif
+#if KMP_ARCH_S390X
+      // s390x /proc/cpuinfo starts with a variable number of lines containing
+      // the overall system information. Skip them.
+      if (reading_s390x_sys_info) {
+        if (*buf == '\n')
+          reading_s390x_sys_info = false;
+        continue;
+      }
+#endif
 
+#if KMP_ARCH_S390X
+      char s1[] = "cpu number";
+#else
       char s1[] = "processor";
+#endif
       if (strncmp(buf, s1, sizeof(s1) - 1) == 0) {
         CHECK_LINE;
         char *p = strchr(buf + sizeof(s1) - 1, ':');
@@ -3062,6 +3078,23 @@ static bool __kmp_affinity_create_cpuinfo_map(int *line,
             threadInfo[num_avail][osIdIndex]);
         __kmp_read_from_file(path, "%u", &threadInfo[num_avail][pkgIdIndex]);
 
+#if KMP_ARCH_S390X
+        // Disambiguate physical_package_id.
+        unsigned book_id;
+        KMP_SNPRINTF(path, sizeof(path),
+                     "/sys/devices/system/cpu/cpu%u/topology/book_id",
+                     threadInfo[num_avail][osIdIndex]);
+        __kmp_read_from_file(path, "%u", &book_id);
+        threadInfo[num_avail][pkgIdIndex] |= (book_id << 8);
+
+        unsigned drawer_id;
+        KMP_SNPRINTF(path, sizeof(path),
+                     "/sys/devices/system/cpu/cpu%u/topology/drawer_id",
+                     threadInfo[num_avail][osIdIndex]);
+        __kmp_read_from_file(path, "%u", &drawer_id);
+        threadInfo[num_avail][pkgIdIndex] |= (drawer_id << 16);
+#endif
+
         KMP_SNPRINTF(path, sizeof(path),
                      "/sys/devices/system/cpu/cpu%u/topology/core_id",
                      threadInfo[num_avail][osIdIndex]);
diff --git a/openmp/runtime/src/kmp_affinity.h b/openmp/runtime/src/kmp_affinity.h
index 97808b528538097..bd4b74cd7c3f3cd 100644
--- a/openmp/runtime/src/kmp_affinity.h
+++ b/openmp/runtime/src/kmp_affinity.h
@@ -291,12 +291,23 @@ class KMPHwlocAffinity : public KMPAffinity {
 #define __NR_sched_setaffinity 203
 #elif __NR_sched_setaffinity != 203
 #error Wrong code for setaffinity system call.
-#endif /* __NR_sched_setaffinity */
+#endif /* __NR_sched_getaffinity */
 #ifndef __NR_sched_getaffinity
 #define __NR_sched_getaffinity 204
 #elif __NR_sched_getaffinity != 204
 #error Wrong code for getaffinity system call.
 #endif /* __NR_sched_getaffinity */
+#elif KMP_ARCH_S390X
+#ifndef __NR_sched_setaffinity
+#define __NR_sched_setaffinity 239
+#elif __NR_sched_setaffinity != 239
+#error Wrong code for setaffinity system call.
+#endif /* __NR_sched_setaffinity */
+#ifndef __NR_sched_getaffinity
+#define __NR_sched_getaffinity 240
+#elif __NR_sched_getaffinity != 240
+#error Wrong code for getaffinity system call.
+#endif /* __NR_sched_getaffinity */
 #else
 #error Unknown or unsupported architecture
 #endif /* KMP_ARCH_* */
diff --git a/openmp/runtime/src/kmp_config.h.cmake b/openmp/runtime/src/kmp_config.h.cmake
index 58bf64112b1a7a7..5f04301c91c60cd 100644
--- a/openmp/runtime/src/kmp_config.h.cmake
+++ b/openmp/runtime/src/kmp_config.h.cmake
@@ -104,6 +104,8 @@
 # define CACHE_LINE 128
 #elif KMP_ARCH_AARCH64_A64FX
 # define CACHE_LINE 256
+#elif KMP_ARCH_S390X
+# define CACHE_LINE 256
 #else
 # define CACHE_LINE 64
 #endif
diff --git a/openmp/runtime/src/kmp_os.h b/openmp/runtime/src/kmp_os.h
index 2c632112a8d8e35..ca694f6f14933cd 100644
--- a/openmp/runtime/src/kmp_os.h
+++ b/openmp/runtime/src/kmp_os.h
@@ -178,7 +178,8 @@ typedef unsigned long long kmp_uint64;
 #if KMP_ARCH_X86 || KMP_ARCH_ARM || KMP_ARCH_MIPS
 #define KMP_SIZE_T_SPEC KMP_UINT32_SPEC
 #elif KMP_ARCH_X86_64 || KMP_ARCH_PPC64 || KMP_ARCH_AARCH64 ||                 \
-    KMP_ARCH_MIPS64 || KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 || KMP_ARCH_VE
+    KMP_ARCH_MIPS64 || KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 ||             \
+    KMP_ARCH_VE || KMP_ARCH_S390X
 #define KMP_SIZE_T_SPEC KMP_UINT64_SPEC
 #else
 #error "Can't determine size_t printf format specifier."
@@ -1043,7 +1044,8 @@ extern kmp_real64 __kmp_xchg_real64(volatile kmp_real64 *p, kmp_real64 v);
 #endif /* KMP_OS_WINDOWS */
 
 #if KMP_ARCH_PPC64 || KMP_ARCH_ARM || KMP_ARCH_AARCH64 || KMP_ARCH_MIPS ||     \
-    KMP_ARCH_MIPS64 || KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 || KMP_ARCH_VE
+    KMP_ARCH_MIPS64 || KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 ||             \
+    KMP_ARCH_VE || KMP_ARCH_S390X
 #if KMP_OS_WINDOWS
 #undef KMP_MB
 #define KMP_MB() std::atomic_thread_fence(std::memory_order_seq_cst)
diff --git a/openmp/runtime/src/kmp_platform.h b/openmp/runtime/src/kmp_platform.h
index 1a2197d338342ac..70e55b427dc9205 100644
--- a/openmp/runtime/src/kmp_platform.h
+++ b/openmp/runtime/src/kmp_platform.h
@@ -94,6 +94,7 @@
 #define KMP_ARCH_RISCV64 0
 #define KMP_ARCH_LOONGARCH64 0
 #define KMP_ARCH_VE 0
+#define KMP_ARCH_S390X 0
 
 #if KMP_OS_WINDOWS
 #if defined(_M_AMD64) || defined(__x86_64)
@@ -146,6 +147,9 @@
 #elif defined __ve__
 #undef KMP_ARCH_VE
 #define KMP_ARCH_VE 1
+#elif defined __s390x__
+#undef KMP_ARCH_S390X
+#define KMP_ARCH_S390X 1
 #endif
 #endif
 
@@ -210,7 +214,8 @@
 // TODO: Fixme - This is clever, but really fugly
 #if (1 != KMP_ARCH_X86 + KMP_ARCH_X86_64 + KMP_ARCH_ARM + KMP_ARCH_PPC64 +     \
               KMP_ARCH_AARCH64 + KMP_ARCH_MIPS + KMP_ARCH_MIPS64 +             \
-              KMP_ARCH_RISCV64 + KMP_ARCH_LOONGARCH64 + KMP_ARCH_VE)
+              KMP_ARCH_RISCV64 + KMP_ARCH_LOONGARCH64 + KMP_ARCH_VE +          \
+              KMP_ARCH_S390X)
 #error Unknown or unsupported architecture
 #endif
 
diff --git a/openmp/runtime/src/kmp_runtime.cpp b/openmp/runtime/src/kmp_runtime.cpp
index e83c09383769a51..ec0aef1ab12926b 100644
--- a/openmp/runtime/src/kmp_runtime.cpp
+++ b/openmp/runtime/src/kmp_runtime.cpp
@@ -8894,7 +8894,8 @@ __kmp_determine_reduction_method(
     int atomic_available = FAST_REDUCTION_ATOMIC_METHOD_GENERATED;
 
 #if KMP_ARCH_X86_64 || KMP_ARCH_PPC64 || KMP_ARCH_AARCH64 ||                   \
-    KMP_ARCH_MIPS64 || KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 || KMP_ARCH_VE
+    KMP_ARCH_MIPS64 || KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 ||             \
+    KMP_ARCH_VE || KMP_ARCH_S390X
 
 #if KMP_OS_LINUX || KMP_OS_DRAGONFLY || KMP_OS_FREEBSD || KMP_OS_NETBSD ||     \
     KMP_OS_OPENBSD || KMP_OS_WINDOWS || KMP_OS_DARWIN || KMP_OS_HURD
diff --git a/openmp/runtime/src/kmp_tasking.cpp b/openmp/runtime/src/kmp_tasking.cpp
index e8eb6b02650377c..1ccbc8ca90bc977 100644
--- a/openmp/runtime/src/kmp_tasking.cpp
+++ b/openmp/runtime/src/kmp_tasking.cpp
@@ -1554,7 +1554,7 @@ kmp_task_t *__kmp_task_alloc(ident_t *loc_ref, kmp_int32 gtid,
   task = KMP_TASKDATA_TO_TASK(taskdata);
 
 // Make sure task & taskdata are aligned appropriately
-#if KMP_ARCH_X86 || KMP_ARCH_PPC64 || !KMP_HAVE_QUAD
+#if KMP_ARCH_X86 || KMP_ARCH_PPC64 || KMP_ARCH_S390X || !KMP_HAVE_QUAD
   KMP_DEBUG_ASSERT((((kmp_uintptr_t)taskdata) & (sizeof(double) - 1)) == 0);
   KMP_DEBUG_ASSERT((((kmp_uintptr_t)task) & (sizeof(double) - 1)) == 0);
 #else
@@ -1737,8 +1737,12 @@ __kmpc_omp_reg_task_with_affinity(ident_t *loc_ref, kmp_int32 gtid,
 // gtid: global thread ID of caller
 // task: the task to invoke
 // current_task: the task to resume after task invocation
-static void __kmp_invoke_task(kmp_int32 gtid, kmp_task_t *task,
-                              kmp_taskdata_t *current_task) {
+#ifdef __s390x__
+__attribute__((target("backchain")))
+#endif
+static void 
+__kmp_invoke_task(kmp_int32 gtid, kmp_task_t *task, 
+		  kmp_taskdata_t *current_task) {
   kmp_taskdata_t *taskdata = KMP_TASK_TO_TASKDATA(task);
   kmp_info_t *thread;
   int discard = 0 /* false */;
diff --git a/openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h b/openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
index ff37eb4ed175e67..bd3fd9b43e574d1 100644
--- a/openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
+++ b/openmp/runtime/src/thirdparty/ittnotify/ittnotify_config.h
@@ -166,6 +166,10 @@
 #define ITT_ARCH_VE 8
 #endif /* ITT_ARCH_VE */
 
+#ifndef ITT_ARCH_S390X
+#define ITT_ARCH_S390X 8
+#endif /* ITT_ARCH_S390X */
+
 #ifndef ITT_ARCH
 #if defined _M_IX86 || defined __i386__
 #define ITT_ARCH ITT_ARCH_IA32
@@ -181,6 +185,8 @@
 #define ITT_ARCH ITT_ARCH_PPC64
 #elif defined __ve__
 #define ITT_ARCH ITT_ARCH_VE
+#elif defined __s390x__
+#define ITT_ARCH ITT_ARCH_S390X
 #endif
 #endif
 
diff --git a/openmp/runtime/src/z_Linux_asm.S b/openmp/runtime/src/z_Linux_asm.S
index 2c0df6e3b08505a..a72705528d4162e 100644
--- a/openmp/runtime/src/z_Linux_asm.S
+++ b/openmp/runtime/src/z_Linux_asm.S
@@ -2252,6 +2252,159 @@ __kmp_invoke_microtask:
 
 #endif /* KMP_ARCH_VE */
 
+#if KMP_ARCH_S390X
+
+//------------------------------------------------------------------------
+//
+// typedef void (*microtask_t)(int *gtid, int *tid, ...);
+//
+// int __kmp_invoke_microtask(microtask_t pkfn, int gtid, int tid, int argc,
+//                            void *p_argv[]
+// #if OMPT_SUPPORT
+//                            ,
+//                            void **exit_frame_ptr
+// #endif
+//                            ) {
+// #if OMPT_SUPPORT
+//   *exit_frame_ptr = OMPT_GET_FRAME_ADDRESS(0);
+// #endif
+//
+//   (*pkfn)(&gtid, &tid, argv[0], ...);
+//
+//   return 1;
+// }
+//
+// Parameters:
+//   r2: pkfn
+//   r3: gtid
+//   r4: tid
+//   r5: argc
+//   r6: p_argv
+//   SP+160: exit_frame_ptr
+//
+// Locals:
+//   __gtid: gtid param pushed on stack so can pass &gtid to pkfn
+//   __tid: tid param pushed on stack so can pass &tid to pkfn
+//
+// Temp. registers:
+//
+//  r0: used to fetch argv slots
+//  r7: used as temporary for number of remaining pkfn parms
+//  r8: argv
+//  r9: pkfn
+//  r10: stack size
+//  r11: previous fp
+//  r12: stack parameter area
+//  r13: argv slot
+//
+// return: r2 (always 1/TRUE)
+//
+
+// -- Begin __kmp_invoke_microtask
+// mark_begin;
+	.text
+	.globl	__kmp_invoke_microtask
+	.p2align	1
+	.type	__kmp_invoke_microtask, at function
+__kmp_invoke_microtask:
+	.cfi_startproc
+
+	stmg	%r6,%r14,48(%r15)
+        .cfi_offset %r6, -112
+        .cfi_offset %r7, -104
+        .cfi_offset %r8, -96
+        .cfi_offset %r9, -88
+        .cfi_offset %r10, -80
+        .cfi_offset %r11, -72
+        .cfi_offset %r12, -64
+        .cfi_offset %r13, -56
+        .cfi_offset %r14, -48
+        .cfi_offset %r15, -40
+	lgr	%r11,%r15
+	.cfi_def_cfa %r11, 160
+
+	// Compute the dynamic stack size:
+	//
+	// - We need 8 bytes for storing 'gtid' and 'tid', so we can pass them by
+	//   reference
+	// - We need 8 bytes for each argument that cannot be passed to the 'pkfn'
+	//   function by register. Given that we have 5 of such registers (r[2-6])
+	//   and two + 'argc' arguments (consider &gtid and &tid), we need to
+	//   reserve max(0, argc - 3)*8 extra bytes
+	//
+	// The total number of bytes is then max(0, argc - 3)*8 + 8
+
+	lgr	%r10,%r5
+	aghi	%r10,-2
+	jnm	0f
+	lghi	%r10,0
+0:
+	sllg	%r10,%r10,3
+	lgr	%r12,%r10
+	aghi	%r10,176
+	sgr 	%r15,%r10
+	agr	%r12,%r15
+	stg	%r11,0(%r15)
+
+	lgr	%r9,%r2			// pkfn
+
+#if OMPT_SUPPORT
+	// Save frame pointer into exit_frame
+	lg	%r8,160(%r11)
+	stg	%r11,0(%r8)
+#endif
+
+	// Prepare arguments for the pkfn function (first 5 using r2-r6 registers)
+
+	stg     %r3,160(%r12)
+	la	%r2,164(%r12)		// gid
+	stg	%r4,168(%r12)		
+	la	%r3,172(%r12)		// tid
+	lgr	%r8,%r6			// argv
+
+	// If argc > 0
+	ltgr	%r7,%r5
+	jz	1f
+
+	lg	%r4,0(%r8)		// argv[0]
+	aghi	%r7,-1
+	jz	1f
+
+	// If argc > 1
+	lg	%r5,8(%r8)		// argv[1]
+	aghi	%r7,-1
+	jz	1f
+
+	// If argc > 2
+	lg	%r6,16(%r8)		// argv[2]
+	aghi	%r7,-1
+	jz	1f
+
+	lghi	%r13,0			// Index [n]
+2:
+	lg	%r0,24(%r13,%r8)	// argv[2+n]
+	stg	%r0,160(%r13,%r15)	// parm[2+n]
+	aghi	%r13,8			// Next
+	aghi	%r7,-1
+	jnz	2b
+
+1:
+	basr	%r14,%r9		// Call pkfn
+
+	// Restore stack and return
+
+	lgr	%r15,%r11
+	lmg	%r6,%r14,48(%r15)
+	lghi	%r2,1
+	br	%r14
+.Lfunc_end0:
+	.size	__kmp_invoke_microtask, .Lfunc_end0-__kmp_invoke_microtask
+	.cfi_endproc
+
+// -- End  __kmp_invoke_microtask
+
+#endif /* KMP_ARCH_S390X */
+
 #if KMP_ARCH_ARM || KMP_ARCH_MIPS
     .data
     COMMON .gomp_critical_user_, 32, 3
@@ -2266,7 +2419,8 @@ __kmp_unnamed_critical_addr:
 #endif /* KMP_ARCH_ARM */
 
 #if KMP_ARCH_PPC64 || KMP_ARCH_AARCH64 || KMP_ARCH_MIPS64 ||                   \
-    KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 || KMP_ARCH_VE
+    KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 || KMP_ARCH_VE ||                 \
+    KMP_ARCH_S390X
 #ifndef KMP_PREFIX_UNDERSCORE
 # define KMP_PREFIX_UNDERSCORE(x) x
 #endif
@@ -2281,7 +2435,8 @@ KMP_PREFIX_UNDERSCORE(__kmp_unnamed_critical_addr):
     .size KMP_PREFIX_UNDERSCORE(__kmp_unnamed_critical_addr),8
 #endif
 #endif /* KMP_ARCH_PPC64 || KMP_ARCH_AARCH64 || KMP_ARCH_MIPS64 ||
-          KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 || KMP_ARCH_VE */
+          KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 || KMP_ARCH_VE || 
+          KMP_ARCH_S390X */
 
 #if KMP_OS_LINUX
 # if KMP_ARCH_ARM || KMP_ARCH_AARCH64
diff --git a/openmp/runtime/src/z_Linux_util.cpp b/openmp/runtime/src/z_Linux_util.cpp
index 5495f60d2029d49..2c331b8400468d5 100644
--- a/openmp/runtime/src/z_Linux_util.cpp
+++ b/openmp/runtime/src/z_Linux_util.cpp
@@ -2465,7 +2465,7 @@ int __kmp_get_load_balance(int max) {
 #if !(KMP_ARCH_X86 || KMP_ARCH_X86_64 || KMP_MIC ||                            \
       ((KMP_OS_LINUX || KMP_OS_DARWIN) && KMP_ARCH_AARCH64) ||                 \
       KMP_ARCH_PPC64 || KMP_ARCH_RISCV64 || KMP_ARCH_LOONGARCH64 ||            \
-      KMP_ARCH_ARM || KMP_ARCH_VE)
+      KMP_ARCH_ARM || KMP_ARCH_VE || KMP_ARCH_S390X)
 
 // we really only need the case with 1 argument, because CLANG always build
 // a struct of pointers to shared variables referenced in the outlined function
diff --git a/openmp/runtime/test/lit.cfg b/openmp/runtime/test/lit.cfg
index 650d3853e851112..27ff057c85f60f2 100644
--- a/openmp/runtime/test/lit.cfg
+++ b/openmp/runtime/test/lit.cfg
@@ -51,6 +51,8 @@ flags = " -I " + config.test_source_root + \
     " " + config.test_extra_flags
 if config.has_omit_frame_pointer_flag:
     flags += " -fno-omit-frame-pointer"
+if config.target_arch == "s390x":
+    flags += " -mbackchain"
 
 config.test_flags = " -I " + config.omp_header_directory + flags
 config.test_flags_use_compiler_omp_h = flags
diff --git a/openmp/runtime/test/ompt/callback.h b/openmp/runtime/test/ompt/callback.h
index c5266e230c26f77..efbd4c716e0ee1e 100644
--- a/openmp/runtime/test/ompt/callback.h
+++ b/openmp/runtime/test/ompt/callback.h
@@ -228,6 +228,22 @@ ompt_label_##id:
   printf("%" PRIu64 ": current_address=%p or %p\n",                            \
          ompt_get_thread_data()->value, ((char *)addr) - 8,                    \
          ((char *)addr) - 8)
+#elif KMP_ARCH_S390X
+// On s390x the NOP instruction is 2 bytes long. For non-void runtime
+// functions Clang inserts a STY instruction (but only if compiling under
+// -fno-PIC which will be the default with Clang 8.0, another 6 bytes).
+//
+// Another possibility is:
+//
+//                brasl %r14,__kmpc_end_master at plt
+//   a7 f4 00 02  j 0f
+//   47 00 00 00  0: nop
+//   a7 f4 00 02  j addr
+//                addr:
+#define print_possible_return_addresses(addr)                                  \
+  printf("%" PRIu64 ": current_address=%p or %p or %p\n",                      \
+         ompt_get_thread_data()->value, ((char *)addr) - 2,                    \
+         ((char *)addr) - 8, ((char *)addr) - 12)
 #else
 #error Unsupported target architecture, cannot determine address offset!
 #endif
diff --git a/openmp/runtime/tools/lib/Platform.pm b/openmp/runtime/tools/lib/Platform.pm
index d62d450e9e5dcf5..6efd932daef561b 100644
--- a/openmp/runtime/tools/lib/Platform.pm
+++ b/openmp/runtime/tools/lib/Platform.pm
@@ -65,6 +65,8 @@ sub canon_arch($) {
             $arch = "riscv64";
         } elsif ( $arch =~ m{\Aloongarch64} ) {
             $arch = "loongarch64";
+        } elsif ( $arch =~ m{\As390x} ) {
+            $arch = "s390x";
         } else {
             $arch = undef;
         }; # if
@@ -230,6 +232,8 @@ sub target_options() {
         $_host_arch = "riscv64";
     } elsif ( $hardware_platform eq "loongarch64" ) {
         $_host_arch = "loongarch64";
+    } elsif ( $hardware_platform eq "s390x" ) {
+        $_host_arch = "s390x";
     } else {
         die "Unsupported host hardware platform: \"$hardware_platform\"; stopped";
     }; # if
@@ -419,7 +423,7 @@ the script assumes host architecture is target one.
 
 Input string is an architecture name to canonize. The function recognizes many variants, for example:
 C<32e>, C<Intel64>, C<Intel(R) 64>, etc. Returned string is a canonized architecture name,
-one of: C<32>, C<32e>, C<64>, C<arm>, C<ppc64le>, C<ppc64>, C<mic>, C<mips>, C<mips64>, C<riscv64>, C<loongarch64> or C<undef> is input string is not recognized.
+one of: C<32>, C<32e>, C<64>, C<arm>, C<ppc64le>, C<ppc64>, C<mic>, C<mips>, C<mips64>, C<riscv64>, C<loongarch64>, C<s390x>, or C<undef> is input string is not recognized.
 
 =item B<legal_arch( $arch )>
 
diff --git a/openmp/runtime/tools/lib/Uname.pm b/openmp/runtime/tools/lib/Uname.pm
index 8a976addcff03e0..9dde444d56a4ece 100644
--- a/openmp/runtime/tools/lib/Uname.pm
+++ b/openmp/runtime/tools/lib/Uname.pm
@@ -160,6 +160,8 @@ if ( 0 ) {
         $values{ hardware_platform } = "riscv64";
     } elsif ( $values{ machine } =~ m{\Aloongarch64\z} ) {
         $values{ hardware_platform } = "loongarch64";
+    } elsif ( $values{ machine } =~ m{\As390x\z} ) {
+        $values{ hardware_platform } = "s390x";
     } else {
         die "Unsupported machine (\"$values{ machine }\") returned by POSIX::uname(); stopped";
     }; # if



More information about the Openmp-commits mailing list