[Openmp-commits] [openmp] 25fe17d - [libomptarget] Initial documentation on amdgpu offload
Jon Chesterfield via Openmp-commits
openmp-commits at lists.llvm.org
Wed May 5 11:59:01 PDT 2021
Author: Jon Chesterfield
Date: 2021-05-05T19:58:52+01:00
New Revision: 25fe17d3c1041de7e2dc5df865d7f65fd074f9a6
URL: https://github.com/llvm/llvm-project/commit/25fe17d3c1041de7e2dc5df865d7f65fd074f9a6
DIFF: https://github.com/llvm/llvm-project/commit/25fe17d3c1041de7e2dc5df865d7f65fd074f9a6.diff
LOG: [libomptarget] Initial documentation on amdgpu offload
[libomptarget] Initial documentation on amdgpu offload
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D101927
Added:
Modified:
openmp/docs/SupportAndFAQ.rst
Removed:
################################################################################
diff --git a/openmp/docs/SupportAndFAQ.rst b/openmp/docs/SupportAndFAQ.rst
index 5e938ecaac762..a469b5e894cc9 100644
--- a/openmp/docs/SupportAndFAQ.rst
+++ b/openmp/docs/SupportAndFAQ.rst
@@ -49,20 +49,16 @@ All patches go through the regular `LLVM review process
.. _build_offload_capable_compiler:
-Q: How to build an OpenMP offload capable compiler?
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-
+Q: How to build an OpenMP GPU offload capable compiler?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
To build an *effective* OpenMP offload capable compiler, only one extra CMake
option, `LLVM_ENABLE_RUNTIMES="openmp"`, is needed when building LLVM (Generic
information about building LLVM is available `here <https://llvm.org/docs/GettingStarted.html>`__.).
Make sure all backends that are targeted by OpenMP to be enabled. By default,
Clang will be built with all backends enabled.
-If your build machine is not the target machine or automatic detection of the
-available GPUs failed, you should also set:
-
-- `CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX` where `XX` is the architecture of your GPU, e.g, 80.
-- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
+For Nvidia offload, please see :ref:`_build_nvidia_offload_capable_compiler`.
+For AMDGPU offload, please see :ref:`_build_amdgpu_offload_capable_compiler`.
.. note::
The compiler that generates the offload code should be the same (version) as
@@ -71,7 +67,75 @@ available GPUs failed, you should also set:
.. _advanced_builds: https://llvm.org//docs/AdvancedBuilds.html
+.. _build_nvidia_offload_capable_compiler:
+
+Q: How to build an OpenMP NVidia offload capable compiler?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+The Cuda SDK is required on the machine that will execute the openmp application.
+
+If your build machine is not the target machine or automatic detection of the
+available GPUs failed, you should also set:
+
+- `CLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_XX` where `XX` is the architecture of your GPU, e.g, 80.
+- `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=YY` where `YY` is the numeric compute capacity of your GPU, e.g., 75.
+
+
+.. _build_amdgpu_offload_capable_compiler:
+
+Q: How to build an OpenMP AMDGPU offload capable compiler?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+A subset of the `ROCm <https://github.com/radeonopencompute>` toolchain is
+required to build the LLVM toolchain and to execute the openmp application.
+Either install ROCm somewhere that cmake's find_package can locate it, or
+build the required subcomponents ROCt and ROCr from source.
+
+The two components used are ROCT-Thunk-Interface, roct, and ROCR-Runtime,
+rocr. Roct is the userspace part of the linux driver. It calls into the
+driver which ships with the linux kernel. It is an implementation detail of
+Rocr from OpenMP's perspective. Rocr is an implementation of `HSA <http://www.hsafoundation.com>`.
+
+ SOURCE_DIR=same-as-llvm-source # e.g. the checkout of llvm-project, next to openmp
+ BUILD_DIR=somewhere
+ INSTALL_PREFIX=same-as-llvm-install
+
+ cd $SOURCE_DIR
+ git clone git at github.com:RadeonOpenCompute/ROCT-Thunk-Interface.git -b roc-4.1.x --single-branch
+ git clone git at github.com:RadeonOpenCompute/ROCR-Runtime.git -b rocm-4.1.x --single-branch
+
+ cd $BUILD_DIR && mkdir roct && cd roct
+ cmake $SOURCE_DIR/ROCT-Thunk-Interface/ -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF
+ make && make install
+
+ cd $BUILD_DIR && mkdir rocr && cd rocr
+ cmake $SOURCE_DIR/ROCR-Runtime/src -DIMAGE_SUPPORT=OFF -DCMAKE_INSTALL_PREFIX=$INSTALL_PREFIX -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=ON
+ make && make install
+
+IMAGE_SUPPORT requires building rocr with clang and is not used by openmp.
+
+Provided cmake's find_package can find the ROCR-Runtime package, LLVM will
+build a tool `bin/amdgpu-arch` which will print a string like 'gfx906' when
+run if it recognises a GPU on the local system. LLVM will also build a shared
+library, libomptarget.rtl.amdgpu.so, which is linked against rocr.
+
+With those libraries installed, then LLVM build and installed, try:
+
+ clang -O2 -fopenmp -fopenmp-targets=amdgcn-amd-amdhsa example.c -o example && ./example
+
+Q: What are the known limitations of OpenMP AMDGPU offload?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+LD_LIBRARY_PATH is presently required to find the openmp libraries.
+
+There is no libc. That is, malloc and printf do not exist. Also no libm, so
+functions like cos(double) will not work from target regions.
+
+Cards from the gfx10 line, 'navi', that use wave32 are not yet implemented.
+
+Some versions of the driver for the radeon vii (gfx906) will error unless the
+environment variable 'export HSA_IGNORE_SRAMECC_MISREPORT=1' is set.
+It is a recent addition to LLVM and the implementation
diff ers from that which
+has been shipping in ROCm and AOMP for some time. Early adopters will encounter
+bugs.
Q: Does OpenMP offloading support work in pre-packaged LLVM releases?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
More information about the Openmp-commits
mailing list