[clang] [llvm] [openmp] [OpenMP] Change build of OpenMP device runtime to be a separate runtime (PR #136729)

Joseph Huber via llvm-commits llvm-commits at lists.llvm.org
Tue Apr 22 10:11:36 PDT 2025


https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/136729

Summary:
Currently we build the OpenMP device runtime as part of the `offload/`
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.

This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from `offload/` into `openmp/` because it
is still the `openmp/` runtime and I feel it is more appropriate. We do
want a generic `offload/` library at some point, but it would be trivial
to then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.

This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.

```
    -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda'     \
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \
```

This also changed where the `.bc` version of the library lives, but it's
still created.


>From 5e5c73a019131f0aa268e6854c917afd8e9066fd Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Tue, 22 Apr 2025 12:05:42 -0500
Subject: [PATCH] [OpenMP] Change build of OpenMP device runtime to be a
 separate runtime

Summary:
Currently we build the OpenMP device runtime as part of the `offload/`
project. This is problematic because it has several restrictions when
compared to the normal offloading runtime. It can only be built with an
up-to-date clang and we need to set the target appropriately. Currently
we hack around this by creating the compiler invocation manually, but
this patch moves it into a separate runtimes build.

This follows the same build we use for libc, libc++, compiler-rt, and
flang-rt. This also moves it from `offload/` into `openmp/` because it
is still the `openmp/` runtime and I feel it is more appropriate. We do
want a generic `offload/` library at some point, but it would be trivial
to then add that as a separate library now that we have the
infrastructure that makes adding these new libraries trivial.

This most importantly will require that users update their build
configs, mostly adding the following lines at a minimum. I was debating
whether or not I should 'auto-upgrade' this, but I just went with a
warning.

```
    -DLLVM_RUNTIME_TARGETS='default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda'     \
    -DRUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES=openmp \
    -DRUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES=openmp \
```

This also changed where the `.bc` version of the library lives, but it's
still created.
---
 clang/lib/Driver/ToolChains/CommonArgs.cpp    |   5 +
 offload/CMakeLists.txt                        |   8 +-
 offload/DeviceRTL/CMakeLists.txt              | 181 ------------------
 offload/cmake/caches/Offload.cmake            |   4 +-
 openmp/CMakeLists.txt                         |  76 +++++---
 openmp/device/CMakeLists.txt                  |  99 ++++++++++
 .../device}/include/Allocator.h               |   0
 .../device}/include/Configuration.h           |   0
 .../device}/include/Debug.h                   |   0
 .../device}/include/DeviceTypes.h             |   0
 .../device}/include/DeviceUtils.h             |   0
 .../device}/include/Interface.h               |   0
 .../device}/include/LibC.h                    |   0
 .../device}/include/Mapping.h                 |   0
 .../device}/include/Profiling.h               |   0
 .../device}/include/State.h                   |   0
 .../device}/include/Synchronization.h         |   0
 .../device}/include/Workshare.h               |   0
 .../include/generated_microtask_cases.gen     |   0
 .../device}/src/Allocator.cpp                 |   0
 .../device}/src/Configuration.cpp             |   0
 .../DeviceRTL => openmp/device}/src/Debug.cpp |   0
 .../device}/src/DeviceUtils.cpp               |   0
 .../device}/src/Kernel.cpp                    |   0
 .../DeviceRTL => openmp/device}/src/LibC.cpp  |   0
 .../device}/src/Mapping.cpp                   |   0
 .../DeviceRTL => openmp/device}/src/Misc.cpp  |   0
 .../device}/src/Parallelism.cpp               |   0
 .../device}/src/Profiling.cpp                 |   0
 .../device}/src/Reduction.cpp                 |   0
 .../DeviceRTL => openmp/device}/src/State.cpp |   0
 .../DeviceRTL => openmp/device}/src/Stub.cpp  |   0
 .../device}/src/Synchronization.cpp           |   0
 .../device}/src/Tasking.cpp                   |   0
 .../device}/src/Workshare.cpp                 |   0
 openmp/docs/SupportAndFAQ.rst                 |   7 +
 36 files changed, 165 insertions(+), 215 deletions(-)
 delete mode 100644 offload/DeviceRTL/CMakeLists.txt
 create mode 100644 openmp/device/CMakeLists.txt
 rename {offload/DeviceRTL => openmp/device}/include/Allocator.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/Configuration.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/Debug.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/DeviceTypes.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/DeviceUtils.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/Interface.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/LibC.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/Mapping.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/Profiling.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/State.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/Synchronization.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/Workshare.h (100%)
 rename {offload/DeviceRTL => openmp/device}/include/generated_microtask_cases.gen (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Allocator.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Configuration.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Debug.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/DeviceUtils.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Kernel.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/LibC.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Mapping.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Misc.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Parallelism.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Profiling.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Reduction.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/State.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Stub.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Synchronization.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Tasking.cpp (100%)
 rename {offload/DeviceRTL => openmp/device}/src/Workshare.cpp (100%)

diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 8646c55060b17..7cc4008ec1f2b 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -2794,6 +2794,11 @@ void tools::addOpenMPDeviceRTL(const Driver &D,
   for (const auto &LibPath : HostTC.getFilePaths())
     LibraryPaths.emplace_back(LibPath);
 
+  // Check the target specific library path for the triple as well.
+  SmallString<128> P(D.Dir);
+  llvm::sys::path::append(P, "..", "lib", Triple.getTriple());
+  LibraryPaths.emplace_back(P);
+
   OptSpecifier LibomptargetBCPathOpt =
       Triple.isAMDGCN()  ? options::OPT_libomptarget_amdgpu_bc_path_EQ
       : Triple.isNVPTX() ? options::OPT_libomptarget_nvptx_bc_path_EQ
diff --git a/offload/CMakeLists.txt b/offload/CMakeLists.txt
index 25c879710645c..70ac6a6d1e6c3 100644
--- a/offload/CMakeLists.txt
+++ b/offload/CMakeLists.txt
@@ -113,6 +113,13 @@ else()
   set(CMAKE_CXX_EXTENSIONS NO)
 endif()
 
+# Emit a warning for people who haven't updated their build.
+if(NOT "openmp" IN_LIST RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES AND
+   NOT "openmp" IN_LIST RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES)
+  message(WARNING "Building the offloading runtime with no device library. See "
+                  "https://openmp.llvm.org//SupportAndFAQ.html for help.")
+endif()
+
 # Set the path of all resulting libraries to a unified location so that it can
 # be used for testing.
 set(LIBOMPTARGET_LIBRARY_DIR ${CMAKE_CURRENT_BINARY_DIR})
@@ -373,7 +380,6 @@ set(LIBOMPTARGET_LLVM_LIBRARY_INTDIR "${LIBOMPTARGET_INTDIR}" CACHE STRING
 
 # Build offloading plugins and device RTLs if they are available.
 add_subdirectory(plugins-nextgen)
-add_subdirectory(DeviceRTL)
 add_subdirectory(tools)
 
 # Build target agnostic offloading library.
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
deleted file mode 100644
index 12f53a30761f3..0000000000000
--- a/offload/DeviceRTL/CMakeLists.txt
+++ /dev/null
@@ -1,181 +0,0 @@
-set(LIBOMPTARGET_BUILD_DEVICERTL_BCLIB TRUE CACHE BOOL
-  "Can be set to false to disable building this library.")
-
-if (NOT LIBOMPTARGET_BUILD_DEVICERTL_BCLIB)
-  message(STATUS "Not building DeviceRTL: Disabled by LIBOMPTARGET_BUILD_DEVICERTL_BCLIB")
-  return()
-endif()
-
-# Check to ensure the host system is a supported host architecture.
-if(NOT ${CMAKE_SIZEOF_VOID_P} EQUAL "8")
-  message(STATUS "Not building DeviceRTL: Runtime does not support 32-bit hosts")
-  return()
-endif()
-
-if (LLVM_DIR)
-  # Builds that use pre-installed LLVM have LLVM_DIR set.
-  # A standalone or LLVM_ENABLE_RUNTIMES=openmp build takes this route
-  find_program(CLANG_TOOL clang PATHS ${LLVM_TOOLS_BINARY_DIR} NO_DEFAULT_PATH)
-elseif (LLVM_TOOL_CLANG_BUILD AND NOT CMAKE_CROSSCOMPILING AND NOT OPENMP_STANDALONE_BUILD)
-  # LLVM in-tree builds may use CMake target names to discover the tools.
-  # A LLVM_ENABLE_PROJECTS=openmp build takes this route
-  set(CLANG_TOOL $<TARGET_FILE:clang>)
-else()
-  message(STATUS "Not building DeviceRTL. No appropriate clang found")
-  return()
-endif()
-
-set(devicertl_base_directory ${CMAKE_CURRENT_SOURCE_DIR})
-set(include_directory ${devicertl_base_directory}/include)
-set(source_directory ${devicertl_base_directory}/src)
-
-set(include_files
-  ${include_directory}/Allocator.h
-  ${include_directory}/Configuration.h
-  ${include_directory}/Debug.h
-  ${include_directory}/Interface.h
-  ${include_directory}/LibC.h
-  ${include_directory}/Mapping.h
-  ${include_directory}/Profiling.h
-  ${include_directory}/State.h
-  ${include_directory}/Synchronization.h
-  ${include_directory}/DeviceTypes.h
-  ${include_directory}/DeviceUtils.h
-  ${include_directory}/Workshare.h
-)
-
-set(src_files
-  ${source_directory}/Allocator.cpp
-  ${source_directory}/Configuration.cpp
-  ${source_directory}/Debug.cpp
-  ${source_directory}/Kernel.cpp
-  ${source_directory}/LibC.cpp
-  ${source_directory}/Mapping.cpp
-  ${source_directory}/Misc.cpp
-  ${source_directory}/Parallelism.cpp
-  ${source_directory}/Profiling.cpp
-  ${source_directory}/Reduction.cpp
-  ${source_directory}/State.cpp
-  ${source_directory}/Synchronization.cpp
-  ${source_directory}/Tasking.cpp
-  ${source_directory}/DeviceUtils.cpp
-  ${source_directory}/Workshare.cpp
-)
-
-# We disable the slp vectorizer during the runtime optimization to avoid
-# vectorized accesses to the shared state. Generally, those are "good" but
-# the optimizer pipeline (esp. Attributor) does not fully support vectorized
-# instructions yet and we end up missing out on way more important constant
-# propagation. That said, we will run the vectorizer again after the runtime
-# has been linked into the user program.
-set(clang_opt_flags -O3 -mllvm -openmp-opt-disable -DSHARED_SCRATCHPAD_SIZE=512 -mllvm -vectorize-slp=false )
-
-# If the user built with the GPU C library enabled we will use that instead.
-if(${LIBOMPTARGET_GPU_LIBC_SUPPORT})
-  list(APPEND clang_opt_flags -DOMPTARGET_HAS_LIBC)
-endif()
-
-# Set flags for LLVM Bitcode compilation.
-set(bc_flags -c -flto -std=c++17 -fvisibility=hidden
-             ${clang_opt_flags} -nogpulib -nostdlibinc
-             -fno-rtti -fno-exceptions -fconvergent-functions
-             -Wno-unknown-cuda-version
-             -DOMPTARGET_DEVICE_RUNTIME
-             -I${include_directory}
-             -I${devicertl_base_directory}/../include
-             -I${devicertl_base_directory}/../../libc
-)
-
-# first create an object target
-function(compileDeviceRTLLibrary target_name target_triple)
-  set(target_bc_flags ${ARGN})
-
-  foreach(src ${src_files})
-    get_filename_component(infile ${src} ABSOLUTE)
-    get_filename_component(outfile ${src} NAME)
-    set(outfile "${outfile}-${target_name}.o")
-    set(depfile "${outfile}.d")
-
-    # Passing an empty CPU to -march= suppressed target specific metadata.
-    add_custom_command(OUTPUT ${outfile}
-      COMMAND ${CLANG_TOOL}
-      ${bc_flags}
-      --target=${target_triple}
-      ${target_bc_flags}
-      -MD -MF ${depfile}
-      ${infile} -o ${outfile}
-      DEPENDS ${infile}
-      DEPFILE ${depfile}
-      COMMENT "Building LLVM bitcode ${outfile}"
-      VERBATIM
-    )
-    if(TARGET clang)
-      # Add a file-level dependency to ensure that clang is up-to-date.
-      # By default, add_custom_command only builds clang if the
-      # executable is missing.
-      add_custom_command(OUTPUT ${outfile}
-        DEPENDS clang
-        APPEND
-      )
-    endif()
-    set_property(DIRECTORY APPEND PROPERTY ADDITIONAL_MAKE_CLEAN_FILES ${outfile})
-
-    list(APPEND obj_files ${CMAKE_CURRENT_BINARY_DIR}/${outfile})
-  endforeach()
-  # Trick to combine these into a bitcode file via the linker's LTO pass. This
-  # is used to provide the legacy `libomptarget-<name>.bc` files. Hack this
-  # through as an executable to get it to use the relocatable link.
-  add_executable(libomptarget-${target_name} ${obj_files})
-  set_target_properties(libomptarget-${target_name} PROPERTIES
-    RUNTIME_OUTPUT_DIRECTORY ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}
-    LINKER_LANGUAGE CXX
-    BUILD_RPATH ""
-    INSTALL_RPATH ""
-    RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
-  target_compile_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}" "-march=")
-  target_link_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}"
-                      "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm" "-march=")
-  install(TARGETS libomptarget-${target_name}
-          PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
-          DESTINATION ${OFFLOAD_INSTALL_LIBDIR})
-
-  add_library(omptarget.${target_name}.all_objs OBJECT IMPORTED)
-  set_property(TARGET omptarget.${target_name}.all_objs APPEND PROPERTY IMPORTED_OBJECTS
-               ${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/libomptarget-${target_name}.bc)
-
-  # Archive all the object files generated above into a static library
-  add_library(omptarget.${target_name} STATIC)
-  set_target_properties(omptarget.${target_name} PROPERTIES
-    ARCHIVE_OUTPUT_DIRECTORY "${LIBOMPTARGET_LLVM_LIBRARY_INTDIR}/${target_triple}"
-    ARCHIVE_OUTPUT_NAME ompdevice
-    LINKER_LANGUAGE CXX
-  )
-  target_link_libraries(omptarget.${target_name} PRIVATE omptarget.${target_name}.all_objs)
-
-  install(TARGETS omptarget.${target_name}
-          ARCHIVE DESTINATION "lib${LLVM_LIBDIR_SUFFIX}/${target_triple}")
-
-  if (CMAKE_EXPORT_COMPILE_COMMANDS)
-    set(ide_target_name omptarget-ide-${target_name})
-    add_library(${ide_target_name} STATIC EXCLUDE_FROM_ALL ${src_files})
-    target_compile_options(${ide_target_name} PRIVATE
-      -fvisibility=hidden --target=${target_triple}
-      -nogpulib -nostdlibinc -Wno-unknown-cuda-version
-    )
-    target_compile_definitions(${ide_target_name} PRIVATE SHARED_SCRATCHPAD_SIZE=512)
-    target_include_directories(${ide_target_name} PRIVATE
-      ${include_directory}
-      ${devicertl_base_directory}/../../libc
-      ${devicertl_base_directory}/../include
-    )
-    install(TARGETS ${ide_target_name} EXCLUDE_FROM_ALL)
-  endif()
-endfunction()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "AMDGPU" IN_LIST LLVM_TARGETS_TO_BUILD)
-  compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=none)
-endif()
-
-if(NOT LLVM_TARGETS_TO_BUILD OR "NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
-  compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
-endif()
diff --git a/offload/cmake/caches/Offload.cmake b/offload/cmake/caches/Offload.cmake
index 5533a6508f5d5..3747a1d3eb299 100644
--- a/offload/cmake/caches/Offload.cmake
+++ b/offload/cmake/caches/Offload.cmake
@@ -5,5 +5,5 @@ set(LLVM_ENABLE_PER_TARGET_RUNTIME_DIR ON CACHE BOOL "")
 set(LLVM_RUNTIME_TARGETS default;amdgcn-amd-amdhsa;nvptx64-nvidia-cuda CACHE STRING "") 
 set(RUNTIMES_nvptx64-nvidia-cuda_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/NVPTX.cmake" CACHE STRING "")
 set(RUNTIMES_amdgcn-amd-amdhsa_CACHE_FILES "${CMAKE_SOURCE_DIR}/../libcxx/cmake/caches/AMDGPU.cmake" CACHE STRING "")
-set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
-set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_nvptx64-nvidia-cuda_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
+set(RUNTIMES_amdgcn-amd-amdhsa_LLVM_ENABLE_RUNTIMES "compiler-rt;libc;openmp;libcxx;libcxxabi" CACHE STRING "")
diff --git a/openmp/CMakeLists.txt b/openmp/CMakeLists.txt
index c206386fa6b61..c1c533d00f8bb 100644
--- a/openmp/CMakeLists.txt
+++ b/openmp/CMakeLists.txt
@@ -88,6 +88,14 @@ else()
   set(CMAKE_CXX_EXTENSIONS NO)
 endif()
 
+# Targeting the GPU directly requires a few flags to make CMake happy.
+if("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+  set(CMAKE_REQUIRED_FLAGS "${CMAKE_REQUIRED_FLAGS} -nogpulib")
+elseif("${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+  set(CMAKE_REQUIRED_FLAGS
+      "${CMAKE_REQUIRED_FLAGS} -flto -c -Wno-unused-command-line-argument")
+endif()
+
 # Check and set up common compiler flags.
 include(config-ix)
 include(HandleOpenMPOptions)
@@ -122,35 +130,41 @@ else()
   get_clang_resource_dir(LIBOMP_HEADERS_INSTALL_PATH SUBDIR include)
 endif()
 
-# Build host runtime library, after LIBOMPTARGET variables are set since they are needed
-# to enable time profiling support in the OpenMP runtime.
-add_subdirectory(runtime)
-
-set(ENABLE_OMPT_TOOLS ON)
-# Currently tools are not tested well on Windows or MacOS X.
-if (APPLE OR WIN32)
-  set(ENABLE_OMPT_TOOLS OFF)
-endif()
-
-option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
-       ${ENABLE_OMPT_TOOLS})
-if (OPENMP_ENABLE_OMPT_TOOLS)
-  add_subdirectory(tools)
-endif()
-
-# Propagate OMPT support to offload
-if(NOT ${OPENMP_STANDALONE_BUILD})
-  set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
-  set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+# Use the current compiler target to determine the appropriate runtime to build.
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn|^nvptx" OR
+   "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn|^nvptx")
+  add_subdirectory(device)
+else()
+  # Build host runtime library, after LIBOMPTARGET variables are set since they
+  # are needed to enable time profiling support in the OpenMP runtime.
+  add_subdirectory(runtime)
+  
+  set(ENABLE_OMPT_TOOLS ON)
+  # Currently tools are not tested well on Windows or MacOS X.
+  if (APPLE OR WIN32)
+    set(ENABLE_OMPT_TOOLS OFF)
+  endif()
+  
+  option(OPENMP_ENABLE_OMPT_TOOLS "Enable building ompt based tools for OpenMP."
+         ${ENABLE_OMPT_TOOLS})
+  if (OPENMP_ENABLE_OMPT_TOOLS)
+    add_subdirectory(tools)
+  endif()
+  
+  # Propagate OMPT support to offload
+  if(NOT ${OPENMP_STANDALONE_BUILD})
+    set(LIBOMP_HAVE_OMPT_SUPPORT ${LIBOMP_HAVE_OMPT_SUPPORT} PARENT_SCOPE)
+    set(LIBOMP_OMP_TOOLS_INCLUDE_DIR ${LIBOMP_OMP_TOOLS_INCLUDE_DIR} PARENT_SCOPE)
+  endif()
+  
+  option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
+  
+  # Build libompd.so
+  add_subdirectory(libompd)
+  
+  # Build documentation
+  add_subdirectory(docs)
+  
+  # Now that we have seen all testsuites, create the check-openmp target.
+  construct_check_openmp_target()
 endif()
-
-option(OPENMP_MSVC_NAME_SCHEME "Build dll with MSVC naming scheme." OFF)
-
-# Build libompd.so
-add_subdirectory(libompd)
-
-# Build documentation
-add_subdirectory(docs)
-
-# Now that we have seen all testsuites, create the check-openmp target.
-construct_check_openmp_target()
diff --git a/openmp/device/CMakeLists.txt b/openmp/device/CMakeLists.txt
new file mode 100644
index 0000000000000..9211186f4012a
--- /dev/null
+++ b/openmp/device/CMakeLists.txt
@@ -0,0 +1,99 @@
+# Ensure the compiler is a valid clang when building the GPU target.
+set(req_ver "${LLVM_VERSION_MAJOR}.${LLVM_VERSION_MINOR}.${LLVM_VERSION_PATCH}")
+if(LLVM_VERSION_MAJOR AND NOT (CMAKE_CXX_COMPILER_ID MATCHES "[Cc]lang" AND
+   ${CMAKE_CXX_COMPILER_VERSION} VERSION_EQUAL "${req_ver}"))
+  message(FATAL_ERROR "Cannot build GPU device runtime. CMake compiler "
+                      "'${CMAKE_CXX_COMPILER_ID} ${CMAKE_CXX_COMPILER_VERSION}' "
+                      " is not 'Clang ${req_ver}'.")
+endif()
+
+set(src_files
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Allocator.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Configuration.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Debug.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Kernel.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/LibC.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Mapping.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Misc.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Parallelism.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Profiling.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Reduction.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/State.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Synchronization.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Tasking.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/DeviceUtils.cpp
+  ${CMAKE_CURRENT_SOURCE_DIR}/src/Workshare.cpp
+)
+
+list(APPEND compile_options -flto)
+list(APPEND compile_options -fvisibility=hidden)
+list(APPEND compile_options -nogpulib)
+list(APPEND compile_options -nostdlibinc)
+list(APPEND compile_options -fno-rtti)
+list(APPEND compile_options -fno-exceptions)
+list(APPEND compile_options -fconvergent-functions)
+list(APPEND compile_options -Wno-unknown-cuda-version)
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+  list(APPEND compile_options --target=${LLVM_DEFAULT_TARGET_TRIPLE})
+endif()
+
+# We disable the slp vectorizer during the runtime optimization to avoid
+# vectorized accesses to the shared state. Generally, those are "good" but
+# the optimizer pipeline (esp. Attributor) does not fully support vectorized
+# instructions yet and we end up missing out on way more important constant
+# propagation. That said, we will run the vectorizer again after the runtime
+# has been linked into the user program.
+list(APPEND compile_flags "SHELL: -mllvm -vectorize-slp=false")
+if("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^amdgcn" OR
+   "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^amdgcn")
+  set(target_name "amdgpu")
+  list(APPEND compile_flags "SHELL:-Xclang -mcode-object-version=none")
+elseif("${LLVM_DEFAULT_TARGET_TRIPLE}" MATCHES "^nvptx" OR
+       "${CMAKE_CXX_COMPILER_TARGET}" MATCHES "^nvptx")
+  set(target_name "nvptx")
+  list(APPEND compile_flags --cuda-feature=+ptx63)
+endif()
+
+# Trick to combine these into a bitcode file via the linker's LTO pass.
+add_executable(libompdevice ${src_files})
+set_target_properties(libompdevice PROPERTIES
+  RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}
+  LINKER_LANGUAGE CXX
+  BUILD_RPATH ""
+  INSTALL_RPATH ""
+  RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
+
+# If the user built with the GPU C library enabled we will use that instead.
+if(LIBOMPTARGET_GPU_LIBC_SUPPORT)
+  target_compile_definitions(libompdevice PRIVATE OMPTARGET_HAS_LIBC)
+endif()
+target_compile_definitions(libompdevice PRIVATE SHARED_SCRATCHPAD_SIZE=512)
+
+target_include_directories(libompdevice PRIVATE 
+                           ${CMAKE_CURRENT_SOURCE_DIR}/include
+                           ${CMAKE_CURRENT_SOURCE_DIR}/../../libc
+                           ${CMAKE_CURRENT_SOURCE_DIR}/../../offload/include)
+target_compile_options(libompdevice PRIVATE ${compile_options})
+target_link_options(libompdevice PRIVATE
+                    "-flto" "-r" "-nostdlib" "-Wl,--lto-emit-llvm")
+if(LLVM_DEFAULT_TARGET_TRIPLE)
+  target_link_options(libompdevice PRIVATE "--target=${LLVM_DEFAULT_TARGET_TRIPLE}")
+endif()
+install(TARGETS libompdevice
+        PERMISSIONS OWNER_WRITE OWNER_READ GROUP_READ WORLD_READ
+        DESTINATION ${OPENMP_INSTALL_LIBDIR})
+
+add_library(ompdevice.all_objs OBJECT IMPORTED)
+set_property(TARGET ompdevice.all_objs APPEND PROPERTY IMPORTED_OBJECTS
+             ${CMAKE_CURRENT_BINARY_DIR}/libomptarget-${target_name}.bc)
+
+# Archive all the object files generated above into a static library
+add_library(ompdevice STATIC)
+add_dependencies(ompdevice libompdevice)
+set_target_properties(ompdevice PROPERTIES
+  ARCHIVE_OUTPUT_DIRECTORY "${OPENMP_INSTALL_LIBDIR}"
+  ARCHIVE_OUTPUT_NAME ompdevice
+  LINKER_LANGUAGE CXX
+)
+target_link_libraries(ompdevice PRIVATE ompdevice.all_objs)
+install(TARGETS ompdevice ARCHIVE DESTINATION "${OPENMP_INSTALL_LIBDIR}")
diff --git a/offload/DeviceRTL/include/Allocator.h b/openmp/device/include/Allocator.h
similarity index 100%
rename from offload/DeviceRTL/include/Allocator.h
rename to openmp/device/include/Allocator.h
diff --git a/offload/DeviceRTL/include/Configuration.h b/openmp/device/include/Configuration.h
similarity index 100%
rename from offload/DeviceRTL/include/Configuration.h
rename to openmp/device/include/Configuration.h
diff --git a/offload/DeviceRTL/include/Debug.h b/openmp/device/include/Debug.h
similarity index 100%
rename from offload/DeviceRTL/include/Debug.h
rename to openmp/device/include/Debug.h
diff --git a/offload/DeviceRTL/include/DeviceTypes.h b/openmp/device/include/DeviceTypes.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceTypes.h
rename to openmp/device/include/DeviceTypes.h
diff --git a/offload/DeviceRTL/include/DeviceUtils.h b/openmp/device/include/DeviceUtils.h
similarity index 100%
rename from offload/DeviceRTL/include/DeviceUtils.h
rename to openmp/device/include/DeviceUtils.h
diff --git a/offload/DeviceRTL/include/Interface.h b/openmp/device/include/Interface.h
similarity index 100%
rename from offload/DeviceRTL/include/Interface.h
rename to openmp/device/include/Interface.h
diff --git a/offload/DeviceRTL/include/LibC.h b/openmp/device/include/LibC.h
similarity index 100%
rename from offload/DeviceRTL/include/LibC.h
rename to openmp/device/include/LibC.h
diff --git a/offload/DeviceRTL/include/Mapping.h b/openmp/device/include/Mapping.h
similarity index 100%
rename from offload/DeviceRTL/include/Mapping.h
rename to openmp/device/include/Mapping.h
diff --git a/offload/DeviceRTL/include/Profiling.h b/openmp/device/include/Profiling.h
similarity index 100%
rename from offload/DeviceRTL/include/Profiling.h
rename to openmp/device/include/Profiling.h
diff --git a/offload/DeviceRTL/include/State.h b/openmp/device/include/State.h
similarity index 100%
rename from offload/DeviceRTL/include/State.h
rename to openmp/device/include/State.h
diff --git a/offload/DeviceRTL/include/Synchronization.h b/openmp/device/include/Synchronization.h
similarity index 100%
rename from offload/DeviceRTL/include/Synchronization.h
rename to openmp/device/include/Synchronization.h
diff --git a/offload/DeviceRTL/include/Workshare.h b/openmp/device/include/Workshare.h
similarity index 100%
rename from offload/DeviceRTL/include/Workshare.h
rename to openmp/device/include/Workshare.h
diff --git a/offload/DeviceRTL/include/generated_microtask_cases.gen b/openmp/device/include/generated_microtask_cases.gen
similarity index 100%
rename from offload/DeviceRTL/include/generated_microtask_cases.gen
rename to openmp/device/include/generated_microtask_cases.gen
diff --git a/offload/DeviceRTL/src/Allocator.cpp b/openmp/device/src/Allocator.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Allocator.cpp
rename to openmp/device/src/Allocator.cpp
diff --git a/offload/DeviceRTL/src/Configuration.cpp b/openmp/device/src/Configuration.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Configuration.cpp
rename to openmp/device/src/Configuration.cpp
diff --git a/offload/DeviceRTL/src/Debug.cpp b/openmp/device/src/Debug.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Debug.cpp
rename to openmp/device/src/Debug.cpp
diff --git a/offload/DeviceRTL/src/DeviceUtils.cpp b/openmp/device/src/DeviceUtils.cpp
similarity index 100%
rename from offload/DeviceRTL/src/DeviceUtils.cpp
rename to openmp/device/src/DeviceUtils.cpp
diff --git a/offload/DeviceRTL/src/Kernel.cpp b/openmp/device/src/Kernel.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Kernel.cpp
rename to openmp/device/src/Kernel.cpp
diff --git a/offload/DeviceRTL/src/LibC.cpp b/openmp/device/src/LibC.cpp
similarity index 100%
rename from offload/DeviceRTL/src/LibC.cpp
rename to openmp/device/src/LibC.cpp
diff --git a/offload/DeviceRTL/src/Mapping.cpp b/openmp/device/src/Mapping.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Mapping.cpp
rename to openmp/device/src/Mapping.cpp
diff --git a/offload/DeviceRTL/src/Misc.cpp b/openmp/device/src/Misc.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Misc.cpp
rename to openmp/device/src/Misc.cpp
diff --git a/offload/DeviceRTL/src/Parallelism.cpp b/openmp/device/src/Parallelism.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Parallelism.cpp
rename to openmp/device/src/Parallelism.cpp
diff --git a/offload/DeviceRTL/src/Profiling.cpp b/openmp/device/src/Profiling.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Profiling.cpp
rename to openmp/device/src/Profiling.cpp
diff --git a/offload/DeviceRTL/src/Reduction.cpp b/openmp/device/src/Reduction.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Reduction.cpp
rename to openmp/device/src/Reduction.cpp
diff --git a/offload/DeviceRTL/src/State.cpp b/openmp/device/src/State.cpp
similarity index 100%
rename from offload/DeviceRTL/src/State.cpp
rename to openmp/device/src/State.cpp
diff --git a/offload/DeviceRTL/src/Stub.cpp b/openmp/device/src/Stub.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Stub.cpp
rename to openmp/device/src/Stub.cpp
diff --git a/offload/DeviceRTL/src/Synchronization.cpp b/openmp/device/src/Synchronization.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Synchronization.cpp
rename to openmp/device/src/Synchronization.cpp
diff --git a/offload/DeviceRTL/src/Tasking.cpp b/openmp/device/src/Tasking.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Tasking.cpp
rename to openmp/device/src/Tasking.cpp
diff --git a/offload/DeviceRTL/src/Workshare.cpp b/openmp/device/src/Workshare.cpp
similarity index 100%
rename from offload/DeviceRTL/src/Workshare.cpp
rename to openmp/device/src/Workshare.cpp
diff --git a/openmp/docs/SupportAndFAQ.rst b/openmp/docs/SupportAndFAQ.rst
index b645723dcfd5e..9e8473ce384d1 100644
--- a/openmp/docs/SupportAndFAQ.rst
+++ b/openmp/docs/SupportAndFAQ.rst
@@ -78,6 +78,13 @@ Clang will be built with all backends enabled. When building with
 ``LLVM_ENABLE_RUNTIMES="openmp"`` OpenMP should not be enabled in 
 ``LLVM_ENABLE_PROJECTS`` because it is enabled by default.
 
+Support for the device library comes from a separate build of the OpenMP library
+that targets the GPU architecture. Building it requires enabling the runtime
+targets, or setting the target manually when doing a standalone build. This is
+done with the ``LLVM_RUNTIME_TARGETS`` option and then enabling the OpenMP
+runtime for the GPU target. ``RUNTIMES_<triple>_LLVM_ENABLE_RUNTIMES``. Refer to
+the cache file for the specific invocation.
+
 For Nvidia offload, please see :ref:`build_nvidia_offload_capable_compiler`.
 For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
 



More information about the llvm-commits mailing list