[clang] [llvm] [OpenMP] Remove 'libomptarget.devicertl.a' fatbinary and use static library (PR #126143)

Joseph Huber via cfe-commits cfe-commits at lists.llvm.org
Thu Feb 6 14:27:32 PST 2025

https://github.com/jhuber6 created https://github.com/llvm/llvm-project/pull/126143

Currently, we build a single `libomptarget.devicertl.a` which is a
fatbinary. It is a host object file that contains the embedded archive
files for both the NVIDIA and AMDGPU targets. This was done primarily as
a convenience due to naming conflicts. Now that the clang driver for the
GPU targets can appropriate link via the per-target runtime-dir, we can
just make two separate static libraries and remove the indirection.

This patch creates two new static libraries that get installed into
for AMDGPU and NVPTX respectively. The link job created by the linker
wrapper now simply needs to do `-lomp` and it will search those
directories and link those static libraries. This requires far less
special handling.

This patch is a precursor to changing the build system entirely to be a
runtimes based one. Soon this target will be a standard `add_library`
and done through the GPU runtime targets.

NOTE that this actually does remove an additional optimization step.
Previously we merged all of the files into a single bitcode object and
forcibly internalized some definitions. This, instead, just treats them
like a normal static library. This may possibly affect performance for
some files, but I think it's better overall to use static library
semantics because it allows us to have an 'include-what-you-use'
relationship with the library.

Performance testing will be required. If we really need the merged blob
then we can simply pack that into a new static library.

>From 1a7559c6ac7cc4a6e02cb7e635eff6fcdbc06093 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Thu, 6 Feb 2025 15:54:19 -0600
Subject: [PATCH] [OpenMP] Remove 'libomptarget.devicertl.a' fatbinary and use
 static library

Currently, we build a single `libomptarget.devicertl.a` which is a
fatbinary. It is a host object file that contains the embedded archive
files for both the NVIDIA and AMDGPU targets. This was done primarily as
a convenience due to naming conflicts. Now that the clang driver for the
GPU targets can appropriate link via the per-target runtime-dir, we can
just make two separate static libraries and remove the indirection.

This patch creates two new static libraries that get installed into
for AMDGPU and NVPTX respectively. The link job created by the linker
wrapper now simply needs to do `-lomp` and it will search those
directories and link those static libraries. This requires far less
special handling.

This patch is a precursor to changing the build system entirely to be a
runtimes based one. Soon this target will be a standard `add_library`
and done through the GPU runtime targets.

NOTE that this actually does remove an additional optimization step.
Previously we merged all of the files into a single bitcode object and
forcibly internalized some definitions. This, instead, just treats them
like a normal static library. This may possibly affect performance for
some files, but I think it's better overall to use static library
semantics because it allows us to have an 'include-what-you-use'
relationship with the library.

Performance testing will be required. If we really need the merged blob
then we can simply pack that into a new static library.
 clang/lib/Driver/ToolChains/Clang.cpp      |   4 +
 clang/lib/Driver/ToolChains/CommonArgs.cpp |   3 -
 offload/DeviceRTL/CMakeLists.txt           | 133 +++++----------------
 offload/test/lit.cfg                       |   8 +-
 4 files changed, 38 insertions(+), 110 deletions(-)

diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index c0891d46b0a62cd..fd690ab11c1c2c3 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -9209,6 +9209,10 @@ void LinkerWrapper::ConstructJob(Compilation &C, const JobAction &JA,
           A->render(Args, LinkerArgs);
+      // If this is OpenMP the device linker will need `-lomp`.
+      if (Kind == Action::OFK_OpenMP && !Args.hasArg(OPT_nogpulib))
+        LinkerArgs.emplace_back("-lomp");
       // Forward all of these to the appropriate toolchain.
       for (StringRef Arg : CompilerArgs)
diff --git a/clang/lib/Driver/ToolChains/CommonArgs.cpp b/clang/lib/Driver/ToolChains/CommonArgs.cpp
index 699aadec86dcba9..93031d2f5302386 100644
--- a/clang/lib/Driver/ToolChains/CommonArgs.cpp
+++ b/clang/lib/Driver/ToolChains/CommonArgs.cpp
@@ -1289,9 +1289,6 @@ bool tools::addOpenMPRuntime(const Compilation &C, ArgStringList &CmdArgs,
   if (IsOffloadingHost)
-  if (IsOffloadingHost && !Args.hasArg(options::OPT_nogpulib))
-    CmdArgs.push_back("-lomptarget.devicertl");
   addArchSpecificRPath(TC, Args, CmdArgs);
   addOpenMPRuntimeLibraryPath(TC, Args, CmdArgs);
diff --git a/offload/DeviceRTL/CMakeLists.txt b/offload/DeviceRTL/CMakeLists.txt
index 8f2a1fd01fabcc8..b3dd4a1997d80d0 100644
--- a/offload/DeviceRTL/CMakeLists.txt
+++ b/offload/DeviceRTL/CMakeLists.txt
@@ -107,15 +107,15 @@ set(bc_flags -c -flto -std=c++17 -fvisibility=hidden
 # first create an object target
-add_library(omptarget.devicertl.all_objs OBJECT IMPORTED)
 function(compileDeviceRTLLibrary target_name target_triple)
   set(target_bc_flags ${ARGN})
   set(bc_files "")
+  add_library(omp.${target_name}.all_objs OBJECT IMPORTED)
   foreach(src ${src_files})
     get_filename_component(infile ${src} ABSOLUTE)
     get_filename_component(outfile ${src} NAME)
-    set(outfile "${outfile}-${target_name}.bc")
+    set(outfile "${outfile}-${target_name}.o")
     set(depfile "${outfile}.d")
     # Passing an empty CPU to -march= suppressed target specific metadata.
@@ -142,99 +142,36 @@ function(compileDeviceRTLLibrary target_name target_triple)
-    list(APPEND bc_files ${outfile})
+    list(APPEND obj_files ${CMAKE_CURRENT_BINARY_DIR}/${outfile})
-  set(bclib_name "libomptarget-${target_name}.bc")
-  # Link to a bitcode library.
-  add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name}
-        -o ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name} ${bc_files}
-      DEPENDS ${bc_files}
-      COMMENT "Linking LLVM bitcode ${bclib_name}"
-  )
-  if(TARGET llvm-link)
-    add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name}
-      DEPENDS llvm-link
-      APPEND)
-  endif()
-  add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/internalized_${bclib_name}
-      COMMAND ${OPT_TOOL} ${link_export_flag} ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name}
-                      -o ${CMAKE_CURRENT_BINARY_DIR}/internalized_${bclib_name}
-      DEPENDS ${source_directory}/exports ${CMAKE_CURRENT_BINARY_DIR}/linked_${bclib_name}
-      COMMENT "Internalizing LLVM bitcode ${bclib_name}"
-  )
-  if(TARGET opt)
-    add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/internalized_${bclib_name}
-      DEPENDS opt
-      APPEND)
-  endif()
-  add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
-      COMMAND ${OPT_TOOL} ${link_opt_flags} ${CMAKE_CURRENT_BINARY_DIR}/internalized_${bclib_name}
-                      -o ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
-      DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/internalized_${bclib_name}
-      COMMENT "Optimizing LLVM bitcode ${bclib_name}"
-  )
-  if(TARGET opt)
-    add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
-      DEPENDS opt
-      APPEND)
-  endif()
-  set(bclib_target_name "omptarget-${target_name}-bc")
-  add_custom_target(${bclib_target_name} DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name})
-  # Copy library to destination.
-  add_custom_command(TARGET ${bclib_target_name} POST_BUILD
-                    COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
-                    ${LIBOMPTARGET_LIBRARY_DIR})
-  add_dependencies(omptarget.devicertl.${target_name} ${bclib_target_name})
-  # Install bitcode library under the lib destination folder.
-  set(target_feature "")
-  if("${target_triple}" STREQUAL "nvptx64-nvidia-cuda")
-    set(target_feature "feature=+ptx63")
-  endif()
-  # Package the bitcode in the bitcode and embed it in an ELF for the static library
-  add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name}
-      COMMAND ${PACKAGER_TOOL} -o ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name}
-        "--image=file=${CMAKE_CURRENT_BINARY_DIR}/${bclib_name},${target_feature},triple=${target_triple},arch=generic,kind=openmp"
-      DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/${bclib_name}
-      COMMENT "Packaging LLVM offloading binary ${bclib_name}.out"
+  set_property(TARGET omp.${target_name}.all_objs
+               APPEND PROPERTY IMPORTED_OBJECTS ${obj_files})
+  # Archive all the object files generated above into a static library
+  add_library(omp.${target_name} STATIC)
+  set_target_properties(omp.${target_name} PROPERTIES
-  if(TARGET clang-offload-packager)
-    add_custom_command(OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name}
-      DEPENDS clang-offload-packager
-      APPEND)
-  endif()
-  set(output_name "${CMAKE_CURRENT_BINARY_DIR}/devicertl-${target_name}.o")
-  add_custom_command(OUTPUT ${output_name}
-    COMMAND ${CLANG_TOOL} --std=c++17 -c -nostdlib
-            -Xclang -fembed-offload-object=${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name}
-            -o ${output_name}
-            ${source_directory}/Stub.cpp
-    DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/packaged_${bclib_name} ${source_directory}/Stub.cpp
-    COMMENT "Embedding LLVM offloading binary in devicertl-${target_name}.o"
-  )
-  if(TARGET clang)
-    add_custom_command(OUTPUT ${output_name}
-      DEPENDS clang
-      APPEND)
-  endif()
-  set_property(TARGET omptarget.devicertl.all_objs APPEND PROPERTY IMPORTED_OBJECTS ${output_name})
+  target_link_libraries(omp.${target_name} PRIVATE omp.${target_name}.all_objs)
+  install(TARGETS omp.${target_name}
+          ARCHIVE DESTINATION "lib${LLVM_LIBDIR_SUFFIX}/${target_triple}")
+  # Trick to combine these into a bitcode file via the linker's LTO pass. This
+  # is used to provide the legacy `libomptarget-<name>.bc` files.
+  add_executable(libomptarget-${target_name} ${obj_files})
+  set_target_properties(libomptarget-${target_name} PROPERTIES
+    RUNTIME_OUTPUT_NAME libomptarget-${target_name}.bc)
+  target_compile_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}")
+  target_link_options(libomptarget-${target_name} PRIVATE "--target=${target_triple}"
+                      "-r" "-nostdlib" "-flto" "-Wl,--lto-emit-llvm")
+  install(TARGETS libomptarget-${target_name}
     set(ide_target_name omptarget-ide-${target_name})
@@ -259,13 +196,3 @@ compileDeviceRTLLibrary(amdgpu amdgcn-amd-amdhsa -Xclang -mcode-object-version=n
 compileDeviceRTLLibrary(nvptx nvptx64-nvidia-cuda --cuda-feature=+ptx63)
-# Archive all the object files generated above into a static library
-add_library(omptarget.devicertl STATIC)
-set_target_properties(omptarget.devicertl PROPERTIES
-target_link_libraries(omptarget.devicertl PRIVATE omptarget.devicertl.all_objs)
diff --git a/offload/test/lit.cfg b/offload/test/lit.cfg
index 658ae5f9653ba90..565edc3e7faeb9d 100644
--- a/offload/test/lit.cfg
+++ b/offload/test/lit.cfg
@@ -183,11 +183,11 @@ def remove_suffix_if_present(name):
 def add_libraries(source):
     if config.libomptarget_has_libc:
-        return source + " -Xoffload-linker " + "-lc " + \
-               "-Xoffload-linker " + "-lm " + \
-               config.llvm_library_intdir + "/libomptarget.devicertl.a"
+        return source + " -Xoffload-linker -lc " + \
+               "-Xoffload-linker -lm " + \
+               "-Xoffload-linker -lomp "
-        return source + " " + config.llvm_library_intdir + "/libomptarget.devicertl.a"
+        return source + " " + "-Xoffload-lnker -lomp"
 # Add platform targets
 host_targets = [

More information about the cfe-commits mailing list