[flang-commits] [flang] [llvm] [flang-rt] Remove experiemental OpenMP offloading support (PR #183653)

Fri Feb 27 10:00:02 PST 2026

https://github.com/jhuber6 updated https://github.com/llvm/llvm-project/pull/183653

>From ddc02252e9f632332fec99a63b2a88a6d98e206d Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Thu, 26 Feb 2026 18:42:28 -0600
Subject: [PATCH 1/2] [flang-rt] Remove experiemental OpenMP offloading support

Summary:
This, as far as I am aware, has mostly been superceded by the runtimes
build that's built on top of libc. This build links 30% faster, supports
more functionality, and uses 95% less disk space, so it seems to be the
direction we want to go.

CUDA support remains, this is not needed urgently.
---
 flang-rt/CMakeLists.txt                       | 27 +-----------
 flang-rt/README.md                            | 19 +++------
 .../cmake/modules/AddFlangRTOffload.cmake     | 42 -------------------
 .../include/flang-rt/runtime/environment.h    |  6 +--
 flang-rt/lib/runtime/CMakeLists.txt           |  2 -
 flang-rt/lib/runtime/external-unit.cpp        |  3 --
 flang-rt/lib/runtime/pseudo-unit.cpp          |  3 --
 flang-rt/lib/runtime/work-queue.cpp           |  9 ++--
 flang-rt/unittests/CMakeLists.txt             | 12 ------
 9 files changed, 13 insertions(+), 110 deletions(-)

diff --git a/flang-rt/CMakeLists.txt b/flang-rt/CMakeLists.txt
index 899d03c671869..f319a1367db3c 100644
--- a/flang-rt/CMakeLists.txt
+++ b/flang-rt/CMakeLists.txt
@@ -179,11 +179,10 @@ if (NOT FLANG_RT_ENABLE_STATIC AND NOT FLANG_RT_ENABLE_SHARED)
 endif ()
 
 
-set(FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT "" CACHE STRING "Compile Flang-RT with GPU support (CUDA or OpenMP)")
+set(FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT "" CACHE STRING "Compile Flang-RT with GPU support (CUDA)")
 set_property(CACHE FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT PROPERTY STRINGS
     ""
     CUDA
-    OpenMP
   )
 if (NOT FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT)
   # Support for GPUs disabled
@@ -191,30 +190,8 @@ elseif (FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT STREQUAL "CUDA")
   # Support for CUDA
   set(FLANG_RT_LIBCUDACXX_PATH "" CACHE PATH "Path to libcu++ package installation")
   option(FLANG_RT_CUDA_RUNTIME_PTX_WITHOUT_GLOBAL_VARS "Do not compile global variables' definitions when producing PTX library" OFF)
-elseif (FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT STREQUAL "OpenMP")
-  # Support for OpenMP offloading
-  set(FLANG_RT_DEVICE_ARCHITECTURES "all" CACHE STRING
-      "List of OpenMP device architectures to be used to compile the Fortran runtime (e.g. 'gfx1103;sm_90')"
-    )
-
-  if (FLANG_RT_DEVICE_ARCHITECTURES STREQUAL "all")
-    # TODO: support auto detection on the build system.
-    set(all_amdgpu_architectures
-      "gfx700;gfx701;gfx801;gfx803;gfx900;gfx902;gfx906"
-      "gfx908;gfx90a;gfx90c;gfx940;gfx1010;gfx1030"
-      "gfx1031;gfx1032;gfx1033;gfx1034;gfx1035;gfx1036"
-      "gfx1100;gfx1101;gfx1102;gfx1103;gfx1150;gfx1151"
-      "gfx1152;gfx1153;gfx1170")
-    set(all_nvptx_architectures
-      "sm_35;sm_37;sm_50;sm_52;sm_53;sm_60;sm_61;sm_62"
-      "sm_70;sm_72;sm_75;sm_80;sm_86;sm_89;sm_90")
-    set(all_gpu_architectures
-      "${all_amdgpu_architectures};${all_nvptx_architectures}")
-      set(FLANG_RT_DEVICE_ARCHITECTURES ${all_gpu_architectures})
-  endif()
-  list(REMOVE_DUPLICATES FLANG_RT_DEVICE_ARCHITECTURES)
 else ()
-  message(FATAL_ERROR "Invalid value '${FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT}' for FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT; must be empty, 'CUDA', or 'OpenMP'")
+  message(FATAL_ERROR "Invalid value '${FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT}' for FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT; must be empty or 'CUDA'")
 endif ()
 
 
diff --git a/flang-rt/README.md b/flang-rt/README.md
index eecb7b8cbfdfd..a7dde887b31ef 100644
--- a/flang-rt/README.md
+++ b/flang-rt/README.md
@@ -146,16 +146,12 @@ CMake itself provide.
    the compiler for `__float128` or 128-bit `long double` support.
    [More details](docs/Real16MathSupport.md).
 
- * `FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT` (values: `"CUDA"`,`"OpenMP"`, `""` default: `""`)
+ * `FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT` (values: `"CUDA"`, `""` default: `""`)
 
    When set to `CUDA`, builds Flang-RT with experimental support for GPU
    accelerators using CUDA. `CMAKE_CUDA_COMPILER` must be set if not
    automatically detected by CMake. `nvcc` as well as `clang` are supported.
 
-   When set to `OpenMP`, builds Flang-RT with experimental support for
-   GPU accelerators using OpenMP offloading. Only Clang is supported for
-   `CMAKE_C_COMPILER` and `CMAKE_CXX_COMPILER`.
-
  * `FLANG_RT_INCLUDE_CUF` (bool, default: `OFF`)
 
    Compiles the `libflang_rt.cuda_<CUDA-version>.a/.so` library. This is
@@ -181,13 +177,10 @@ additional configuration options become available.
    default.
 
 
-### Experimental OpenMP Offload Support
-
-With `-DFLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT=OpenMP`, the following
-additional configuration options become available.
 
- * `FLANG_RT_DEVICE_ARCHITECTURES` (default: `"all"`)
+### GPU Offloading Support
 
-   A list of device architectures that Flang-RT is going to support.
-   If `"all"` uses a pre-defined list of architectures. Same purpose as
-   `LIBOMPTARGET_DEVICE_ARCHITECTURES` from liboffload.
+Flang-RT can be built for GPU targets (AMDGPU, NVPTX) using the LLVM
+runtimes build infrastructure. The easiest way to configure a build for
+GPU offloading is via the CMake cache file at
+`offload/cmake/caches/FlangOffload.cmake`.
diff --git a/flang-rt/cmake/modules/AddFlangRTOffload.cmake b/flang-rt/cmake/modules/AddFlangRTOffload.cmake
index cbc69f3a9656a..c06ed1c68bcf0 100644
--- a/flang-rt/cmake/modules/AddFlangRTOffload.cmake
+++ b/flang-rt/cmake/modules/AddFlangRTOffload.cmake
@@ -71,45 +71,3 @@ macro(enable_cuda_compilation name files)
   endif()
 endmacro()
 
-macro(enable_omp_offload_compilation name files)
-  if (FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT STREQUAL "OpenMP")
-    # OpenMP offload build only works with Clang compiler currently.
-
-    if (FLANG_RT_ENABLE_SHARED)
-      message(FATAL_ERROR
-        "FLANG_RT_ENABLE_SHARED is not supported for OpenMP offload build of Flang-RT"
-        )
-    endif()
-
-    if ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang" AND
-        "${CMAKE_C_COMPILER_ID}" MATCHES "Clang")
-
-      string(REPLACE ";" "," compile_for_architectures
-        "${FLANG_RT_DEVICE_ARCHITECTURES}"
-        )
-
-      set(OMP_COMPILE_OPTIONS
-        -fopenmp
-        -fvisibility=hidden
-        -fopenmp-cuda-mode
-        --offload-arch=${compile_for_architectures}
-        # Force LTO for the device part.
-        -foffload-lto
-        )
-      set_source_files_properties(${files} PROPERTIES COMPILE_OPTIONS
-        "${OMP_COMPILE_OPTIONS}"
-        )
-      target_link_options(${name}.static PUBLIC ${OMP_COMPILE_OPTIONS})
-
-      # Enable "declare target" in the source code.
-      set_source_files_properties(${files}
-        PROPERTIES COMPILE_DEFINITIONS OMP_OFFLOAD_BUILD
-        )
-    else()
-      message(FATAL_ERROR
-        "Flang-rt build with OpenMP offload is not supported for these compilers:\n"
-        "CMAKE_CXX_COMPILER_ID: ${CMAKE_CXX_COMPILER_ID}\n"
-        "CMAKE_C_COMPILER_ID: ${CMAKE_C_COMPILER_ID}")
-    endif()
-  endif()
-endmacro()
diff --git a/flang-rt/include/flang-rt/runtime/environment.h b/flang-rt/include/flang-rt/runtime/environment.h
index 351aef9f23f09..71814aa1d5c1f 100644
--- a/flang-rt/include/flang-rt/runtime/environment.h
+++ b/flang-rt/include/flang-rt/runtime/environment.h
@@ -40,11 +40,7 @@ struct ExecutionEnvironment {
   typedef void (*ConfigEnvCallbackPtr)(
       int, const char *[], const char *[], const EnvironmentDefaultList *);
 
-#if !defined(_OPENMP)
-  // FIXME: https://github.com/llvm/llvm-project/issues/84942
-  constexpr
-#endif
-      ExecutionEnvironment(){};
+  constexpr ExecutionEnvironment() {};
   void Configure(int argc, const char *argv[], const char *envp[],
       const EnvironmentDefaultList *envDefaults);
 
diff --git a/flang-rt/lib/runtime/CMakeLists.txt b/flang-rt/lib/runtime/CMakeLists.txt
index 9fa8376e9b99c..9b601b1c72e94 100644
--- a/flang-rt/lib/runtime/CMakeLists.txt
+++ b/flang-rt/lib/runtime/CMakeLists.txt
@@ -197,7 +197,6 @@ if (NOT WIN32)
   )
 
   enable_cuda_compilation(flang_rt.runtime "${supported_sources}")
-  enable_omp_offload_compilation(flang_rt.runtime "${supported_sources}")
 
   # Select a default runtime, which is used for unit and regression tests.
   get_target_property(default_target flang_rt.runtime.default ALIASED_TARGET)
@@ -231,7 +230,6 @@ else()
       )
 
     enable_cuda_compilation(${name} "${supported_sources}")
-    enable_omp_offload_compilation(${name} "${supported_sources}")
     add_dependencies(flang_rt.runtime ${name})
   endfunction ()
 
diff --git a/flang-rt/lib/runtime/external-unit.cpp b/flang-rt/lib/runtime/external-unit.cpp
index 0c08691673823..25e6981334f75 100644
--- a/flang-rt/lib/runtime/external-unit.cpp
+++ b/flang-rt/lib/runtime/external-unit.cpp
@@ -16,9 +16,6 @@
 #include "flang-rt/runtime/lock.h"
 #include "flang-rt/runtime/tools.h"
 
-// NOTE: the header files above may define OpenMP declare target
-// variables, so they have to be included unconditionally
-// so that the offload entries are consistent between host and device.
 #if !defined(RT_USE_PSEUDO_FILE_UNIT)
 
 #include <cstdio>
diff --git a/flang-rt/lib/runtime/pseudo-unit.cpp b/flang-rt/lib/runtime/pseudo-unit.cpp
index 4242f685134ed..7d6ddd9d8e2a4 100644
--- a/flang-rt/lib/runtime/pseudo-unit.cpp
+++ b/flang-rt/lib/runtime/pseudo-unit.cpp
@@ -15,9 +15,6 @@
 #include "flang-rt/runtime/io-error.h"
 #include "flang-rt/runtime/tools.h"
 
-// NOTE: the header files above may define OpenMP declare target
-// variables, so they have to be included unconditionally
-// so that the offload entries are consistent between host and device.
 #if defined(RT_USE_PSEUDO_FILE_UNIT)
 #include <cstdio>
 
diff --git a/flang-rt/lib/runtime/work-queue.cpp b/flang-rt/lib/runtime/work-queue.cpp
index 9ae751ae3367a..f54ffccd29ef3 100644
--- a/flang-rt/lib/runtime/work-queue.cpp
+++ b/flang-rt/lib/runtime/work-queue.cpp
@@ -14,8 +14,7 @@
 
 namespace Fortran::runtime {
 
-#if !defined(RT_DEVICE_COMPILATION) && !defined(OMP_OFFLOAD_BUILD)
-// FLANG_RT_DEBUG code is disabled when false.
+#if !defined(RT_DEVICE_COMPILATION)
 static constexpr bool enableDebugOutput{false};
 #endif
 
@@ -79,7 +78,7 @@ RT_API_ATTRS Ticket &WorkQueue::StartTicket() {
     last_ = newTicket;
   }
   newTicket->ticket.begun = false;
-#if !defined(RT_DEVICE_COMPILATION) && !defined(OMP_OFFLOAD_BUILD)
+#if !defined(RT_DEVICE_COMPILATION)
   if (enableDebugOutput &&
       (executionEnvironment.internalDebugging &
           ExecutionEnvironment::WorkQueue)) {
@@ -93,7 +92,7 @@ RT_API_ATTRS int WorkQueue::Run() {
   while (last_) {
     TicketList *at{last_};
     insertAfter_ = last_;
-#if !defined(RT_DEVICE_COMPILATION) && !defined(OMP_OFFLOAD_BUILD)
+#if !defined(RT_DEVICE_COMPILATION)
     if (enableDebugOutput &&
         (executionEnvironment.internalDebugging &
             ExecutionEnvironment::WorkQueue)) {
@@ -102,7 +101,7 @@ RT_API_ATTRS int WorkQueue::Run() {
     }
 #endif
     int stat{at->ticket.Continue(*this)};
-#if !defined(RT_DEVICE_COMPILATION) && !defined(OMP_OFFLOAD_BUILD)
+#if !defined(RT_DEVICE_COMPILATION)
     if (enableDebugOutput &&
         (executionEnvironment.internalDebugging &
             ExecutionEnvironment::WorkQueue)) {
diff --git a/flang-rt/unittests/CMakeLists.txt b/flang-rt/unittests/CMakeLists.txt
index e1ab73d7d9301..374b75b11c709 100644
--- a/flang-rt/unittests/CMakeLists.txt
+++ b/flang-rt/unittests/CMakeLists.txt
@@ -42,18 +42,6 @@ function(add_flangrt_unittest_offload_properties target)
       PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON
       )
   endif()
-  # Enable OpenMP offload during linking. We may need to replace
-  # LINK_OPTIONS with COMPILE_OPTIONS when there are OpenMP offload
-  # unittests.
-  #
-  # FIXME: replace 'native' in --offload-arch option with the list
-  #        of targets that Fortran Runtime was built for.
-  if (FLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT STREQUAL "OpenMP")
-    set_target_properties(${target}
-      PROPERTIES LINK_OPTIONS
-      "-fopenmp;--offload-arch=native"
-      )
-  endif()
 endfunction()
 
 # flang-rt on Windows requires compiler-rt for some symbols. For binaries built

>From 2dfcd83c34679d45c126418e86948646fadae5a4 Mon Sep 17 00:00:00 2001
From: Joseph Huber <huberjn at outlook.com>
Date: Fri, 27 Feb 2026 11:59:51 -0600
Subject: [PATCH 2/2] docs

---
 flang/docs/GettingStarted.md | 42 +++++++++++-------------------------
 1 file changed, 12 insertions(+), 30 deletions(-)

diff --git a/flang/docs/GettingStarted.md b/flang/docs/GettingStarted.md
index 1079f82cd69ad..62920bbb73ce4 100644
--- a/flang/docs/GettingStarted.md
+++ b/flang/docs/GettingStarted.md
@@ -204,9 +204,18 @@ ninja install
 
 
 ### Building Flang-RT for accelerators
-Flang runtime can be built for accelerators in experimental mode, i.e.
-complete enabling is WIP.  CUDA and OpenMP target offload builds
-are currently supported.
+Flang runtime can be built for GPU targets (AMDGPU, NVPTX) using the LLVM
+runtimes build infrastructure. The recommended way to configure a build for GPU
+offloading is via the CMake cache file provided by `offload`.
+
+```bash
+cmake ../llvm -G Ninja                              \
+    -C ../offload/cmake/caches/FlangOffload.cmake   \
+    -DCMAKE_BUILD_TYPE=Release                      \
+    -DCMAKE_INSTALL_PREFIX=<PATH>
+```
+
+An experimental CUDA build of the runtime is also available.
 
 #### Building out-of-tree
 
@@ -299,33 +308,6 @@ number sufficiently low for all build jobs to fit into the available RAM. Using
 the number of harware threads (`nprocs`) is likely too much for most
 commodity machines.
 
-##### OpenMP target offload build
-Only Clang compiler is currently supported.
-
-```bash
-cd llvm-project
-rm -rf build_flang_runtime
-mkdir build_flang_runtime
-cd build_flang_runtime
-
-cmake \
-  -DLLVM_ENABLE_RUNTIMES=flang-rt \
-  -DFLANG_RT_EXPERIMENTAL_OFFLOAD_SUPPORT="OpenMP" \
-  -DCMAKE_C_COMPILER=clang \
-  -DCMAKE_CXX_COMPILER=clang++ \
-  -DFLANG_RT_DEVICE_ARCHITECTURES=all \
-  ../runtimes/
-
-make flang-rt
-```
-
-The result of the build is a "device-only" library, i.e. the host
-part of the library is just a container for the device code.
-The resulting library may be linked to user programs using
-Clang-like device linking pipeline.
-
-The same set of CMake variables works for Flang in-tree build.
-
 ### Build options
 
 One may provide optional CMake variables to customize the build. Available options: