[Openmp-commits] [openmp] [OpenMP] Update out of date documentation (PR #142411)

Tue Jun 3 10:51:37 PDT 2025

================
@@ -92,104 +92,38 @@ For AMDGPU offload, please see :ref:`build_amdgpu_offload_capable_compiler`.
 
 Q: How to build an OpenMP Nvidia offload capable compiler?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-The Cuda SDK is required on the machine that will execute the openmp application.
-
-If your build machine is not the target machine or automatic detection of the
-available GPUs failed, you should also set:
-
-- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_<xy>;...'`` where ``<xy>`` is the numeric
-  compute capability of your GPU. For instance, set 
-  ``LIBOMPTARGET_DEVICE_ARCHITECTURES='sm_70;sm_80'`` to target the Nvidia Volta
-  and Ampere architectures. 
-
+The Cuda SDK is required on the machine that will build and execute the
+offloading application. Normally this is only required at runtime by dynamically
+opening the CUDA driver API. If this is disabled with
+``LIBOMPTARGET_DLOPEN_PLUGINS`` it will be directly linked at LLVM build time.
 
 .. _build_amdgpu_offload_capable_compiler:
 
 Q: How to build an OpenMP AMDGPU offload capable compiler?
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-A subset of the `ROCm <https://github.com/radeonopencompute>`_ toolchain is
-required to build the LLVM toolchain and to execute the openmp application.
-Either install ROCm somewhere that cmake's find_package can locate it, or
-build the required subcomponents ROCt and ROCr from source.
-
-The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime, rocr.
-Roct is the userspace part of the linux driver. It calls into the driver which
-ships with the linux kernel. It is an implementation detail of Rocr from
-OpenMP's perspective. Rocr is an implementation of `HSA
-<http://www.hsafoundation.com>`_.
-
-.. code-block:: text
-
-  SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
-  BUILD_DIR=somewhere
-  INSTALL_PREFIX=same-as-llvm-install
-
-  cd $SOURCE_DIR
-  git clone git at github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.2.x \
-    --single-branch
-  git clone git at github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.2.x \
-    --single-branch
-
-  cd $BUILD_DIR && mkdir roct && cd roct
-  cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX \
-    -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
-  make && make install
-
-  cd $BUILD_DIR && mkdir rocr && cd rocr
-  cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF \
-    -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release \
-    -DBUILD_SHARED_LIBS=ON
-  make && make install
-
-``IMAGE_SUPPORT`` requires building rocr with clang and is not used by openmp.
-
-Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
-build a tool ``bin/amdgpu-arch`` which will print a string like ``gfx906`` when
-run if it recognises a GPU on the local system. LLVM will also build a shared
-library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
-
-With those libraries installed, then LLVM build and installed, try:
-
-.. code-block:: shell
-
-    clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
 
-If your build machine is not the target machine or automatic detection of the
-available GPUs failed, you should also set:
-
-- ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx<xyz>;...'`` where ``<xyz>`` is the
-  shader core instruction set architecture. For instance, set 
-  ``LIBOMPTARGET_DEVICE_ARCHITECTURES='gfx906;gfx90a'`` to target AMD GCN5
-  and CDNA2 devices. 
+The ROCm toolchain is normally required to build and execute the offloading
+application unless disabled with ``LIBOMPTARGET_DLOPEN_PLUGINS``. The component
+that we rely on is specifically the HSA runtime called ROCR. Users can build
+this manually instead if preferred.
----------------
jdoerfert wrote:

We might want to have a single section for `LIBOMPTARGET_DLOPEN_PLUGINS` and then refere to it with NVIDIA and AMD.

https://github.com/llvm/llvm-project/pull/142411