[libc-commits] [libc] [libc] Update GPU documentation pages (PR #84076)

Tue Mar 5 16:11:02 PST 2024

================
@@ -78,11 +121,120 @@ inspect the resulting code. This can be done using the following utilities:
 
 .. code-block:: sh
 
-   $> llvm-ar x libcgpu.a strcmp.cpp.o
-   $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
-   $> opt -S out.bc
-   ...
+  $> llvm-ar x libcgpu.a strcmp.cpp.o
+  $> clang-offload-packager strcmp.cpp.o --image=arch=generic,file=gfx90a.bc
+  $> opt -S out.bc
+  ...
 
 Please note that this fat binary format is provided for compatibility with
 existing offloading toolchains. The implementation in ``libc`` does not depend
 on any existing offloading languages and is completely freestanding.
+
+Direct compilation
+------------------
+
+Instead of using standard offloading languages, we can also target the CPU
+directly using C and C++ to create a GPU executable similarly to OpenCL. This is
+done by targeting the GPU architecture directly using `clang's cross compilation
+support https://clang.llvm.org/docs/CrossCompilation.html`_. This is the method
+that the GPU C library uses both to provide its definitions and to run tests.
+
+This allows us to easily define GPU specific libraries and programs that fit
+well into existing tools. In order to target the GPU effectively we rely heavily
+on the compiler's intrinsic and built-in functions. For example, the following
+function gets the thread identifier in the 'x' dimension on both GPUs.
+
+.. code-block:: c++
+
+  uint32_t get_thread_id_x() {
+  #if defined(__AMDGPU__)
+    return __builtin_amdgcn_workitem_id_x();
+  #elif defined(__NVPTX__)
+    return __nvvm_read_ptx_sreg_tid_x();
+  #else
+  #error "Unsupported platform"
+  #endif
+  }
+
+We can then compile this for both NVPTX and AMDGPU into LLVM-IR using the
+following commands.
+
+.. code-block:: sh
+
+  $> clang id.c --target=amdgcn-amd-amdhsa -mcpu=native -nogpulib -flto -c
+  $> clang id.c --target=nvptx64-nvidia-cuda -march=native -nogpulib -flto -c
+
+This support allows us to treat the GPU as a hosted environment by providing a C
+library and startup object just like a standard C library running on the host
+machine. Then, in order to execute these images we provide a loader utility to
+launch the executable on the GPU similar to a cross-compiling emulator.
+
+Building for AMDGPU targets
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The AMDGPU target supports several features natively by virtue of using ``lld``
+as its linker. The installation will include the ``include/amdgcn-amd-amdhsa``
+and ``lib/amdgcn-amd-amdha`` directories that contain the necessary code to use
+the library. We can directly link against ``libc.a`` and use LTO to generate the
+final executable. The ``crt1.o`` object contains the kernels necessary to launch
+the ``main`` function.
+
+.. code-block:: c++
+
+  #include <stdio.h>
+
+  int main() { fputs("Hello from AMDGPU!\n", stdout); }
+
+This function can be compiled with the C library support. Note that ``-flto``
+and ``-mcpu=`` should be defined. This is because the GPU subarchitectures do
+not have strict backwards compatibility. Use ``-mcpu=help`` for accepted
+arguments or ``-mcpu=native`` to target the system's installed GPUs if present.
+Once built, we use the ``amdhsa-loader`` utility to launch execution on the GPU.
+This will be built if the ``hsa_runtime64`` library was found during build time.
+
+.. code-block:: sh
+
+  $> clang hello.c --target=amdgcn-amd-amdhsa -mcpu=native -flto -lc <install>/lib/amdgcn-amd-amdhsa/crt1.o
----------------
jhuber6 wrote:

Yes, the text above explains why
> Note that ``-flto`` and ``-mcpu=`` should be defined. This is because the GPU subarchitectures do
not have strict backwards compatibility. Use ``-mcpu=help`` for accepted
arguments or ``-mcpu=native`` to target the system's installed GPUs if present.

https://github.com/llvm/llvm-project/pull/84076