[libc-commits] [libc] 807f058 - [libc][Docs] Begin improving documentation for the GPU libc

Wed Apr 26 08:31:03 PDT 2023

Author: Joseph Huber
Date: 2023-04-26T10:30:54-05:00
New Revision: 807f0584874d61b0eec5a3ed988402387560534c

URL: https://github.com/llvm/llvm-project/commit/807f0584874d61b0eec5a3ed988402387560534c
DIFF: https://github.com/llvm/llvm-project/commit/807f0584874d61b0eec5a3ed988402387560534c.diff

LOG: [libc][Docs] Begin improving documentation for the GPU libc

This patch updates some of the documentation for the GPU libc project.
There is a lot of work still to be done, but this sets the general
outline.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D149194

Added: 
    libc/docs/gpu/index.rst
    libc/docs/gpu/rpc.rst
    libc/docs/gpu/support.rst
    libc/docs/gpu/testing.rst
    libc/docs/gpu/using.rst

Modified: 
    libc/docs/index.rst

Removed: 
    libc/docs/gpu_mode.rst


################################################################################
diff  --git a/libc/docs/gpu/index.rst b/libc/docs/gpu/index.rst
new file mode 100644
index 0000000000000..0ea54a7235459

--- /dev/null
+++ b/libc/docs/gpu/index.rst
@@ -0,0 +1,18 @@
+.. _libc_gpu:
+
+=============
+libc for GPUs
+=============
+
+.. note:: This feature is very experimental and may change in the future.
+
+The *GPU* support for LLVM's libc project aims to make a subset of the standard
+C library available on GPU based accelerators. Navigate using the links below to
+learn more about this project.
+
+.. toctree::
+
+   using
+   support
+   testing
+   rpc

diff  --git a/libc/docs/gpu/rpc.rst b/libc/docs/gpu/rpc.rst
new file mode 100644
index 0000000000000..bdc2c4ac312cf
--- /dev/null
+++ b/libc/docs/gpu/rpc.rst
@@ -0,0 +1,17 @@
+.. _libc_gpu_rpc:
+
+======================
+Remote Procedure Calls
+======================
+
+.. contents:: Table of Contents
+  :depth: 4
+  :local:
+
+Remote Procedure Call Implementation
+====================================
+
+Certain features from the standard C library, such as allocation or printing,
+require support from the operating system. We instead implement a remote
+procedure call (RPC) interface to allow submitting work from the GPU to a host
+server that forwards it to the host system.

diff  --git a/libc/docs/gpu/support.rst b/libc/docs/gpu/support.rst
new file mode 100644
index 0000000000000..59fdb61966838
--- /dev/null
+++ b/libc/docs/gpu/support.rst
@@ -0,0 +1,88 @@
+.. _libc_gpu_support:
+
+===================
+Supported Functions
+===================
+
+.. include:: ../check.rst
+
+.. contents:: Table of Contents
+  :depth: 4
+  :local:
+
+The following functions and headers are supported at least partially on the
+device. Some functions are implemented fully on the GPU, while others require a
+`remote procedure call <libc_gpu_rpc>`.
+
+ctype.h
+-------
+
+=============  =========  ============
+Function Name  Available  RPC Required
+=============  =========  ============
+isalnum        |check|
+isalpha        |check|
+isascii        |check|
+isblank        |check|
+iscntrl        |check|
+isdigit        |check|
+isgraph        |check|
+islower        |check|
+isprint        |check|
+ispunct        |check|
+isspace        |check|
+isupper        |check|
+isxdigit       |check|
+toascii        |check|
+tolower        |check|
+toupper        |check|
+=============  =========  ============
+
+string.h
+--------
+
+=============  =========  ============
+Function Name  Available  RPC Required
+=============  =========  ============
+bcmp           |check|
+bzero          |check|
+memccpy        |check|
+memchr         |check|
+memcmp         |check|
+memcpy         |check|
+memmove        |check|
+mempcpy        |check|
+memrchr        |check|
+memset         |check|
+stpcpy         |check|
+stpncpy        |check|
+strcat         |check|
+strchr         |check|
+strcmp         |check|
+strcpy         |check|
+strcspn        |check|
+strlcat        |check|
+strlcpy        |check|
+strlen         |check|
+strncat        |check|
+strncmp        |check|
+strncpy        |check|
+strnlen        |check|
+strpbrk        |check|
+strrchr        |check|
+strspn         |check|
+strstr         |check|
+strtok         |check|
+strtok_r       |check|
+strdup
+strndup
+=============  =========  ============
+
+stdlib.h
+--------
+
+=============  =========  ============
+Function Name  Available  RPC Required
+=============  =========  ============
+atoi           |check|
+=============  =========  ============

diff  --git a/libc/docs/gpu/testing.rst b/libc/docs/gpu/testing.rst
new file mode 100644
index 0000000000000..09e875aea1366
--- /dev/null
+++ b/libc/docs/gpu/testing.rst
@@ -0,0 +1,32 @@
+.. _libc_gpu_testing:
+
+
+============================
+Testing the GPU libc library
+============================
+
+.. contents:: Table of Contents
+  :depth: 4
+  :local:
+
+Testing Infrastructure
+======================
+
+The testing support in LLVM's libc implementation for GPUs is designed to mimic
+the standard unit tests as much as possible. We use the `remote procedure call
+<libc_gpu_rpc>` support to provide the necessary utilities like printing from
+the GPU. Execution is performed by emitting a ``_start`` kernel from the GPU
+that is then called by an external loader utility. This is an example of how
+this can be done manually:
+
+.. code-block:: sh
+
+   $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto
+   $> ./amdhsa_loader --threads 1 --blocks 1 a.out
+   Test Passed!
+
+Unlike the exported ``libcgpu.a``, the testing architecture can only support a
+single architecture at a time. This is either detected automatically, or set
+manually by the user using ``LIBC_GPU_TEST_ARCHITECTURE``. The latter is useful
+in cases where the user does not build LLVM's libc on machine with the GPU to
+use for testing.

diff  --git a/libc/docs/gpu/using.rst b/libc/docs/gpu/using.rst
new file mode 100644
index 0000000000000..6808f05ad13b6
--- /dev/null
+++ b/libc/docs/gpu/using.rst
@@ -0,0 +1,87 @@
+.. _libc_gpu_usage:
+
+
+===================
+Using libc for GPUs
+===================
+
+.. contents:: Table of Contents
+  :depth: 4
+  :local:
+
+Building the GPU library
+========================
+
+LLVM's libc GPU support *must* be built with an up-to-date ``clang`` compiler
+due to heavy reliance on ``clang``'s GPU support. This can be done automatically
+using the ``LLVM_ENABLE_RUNTIMES=libc`` option. To enable libc for the GPU,
+enable the ``LIBC_GPU_BUILD`` option. By default, ``libcgpu.a`` will be built
+using every supported GPU architecture. To restrict the number of architectures
+build, either set ``LLVM_LIBC_GPU_ARCHITECTURES`` to the list of desired
+architectures manually or use ``native`` to detect the GPUs on your system. A
+typical ``cmake`` configuration will look like this:
+
+.. code-block:: sh
+
+  $> cd llvm-project  # The llvm-project checkout
+  $> mkdir build
+  $> cd build
+  $> cmake ../llvm -G Ninja                                \
+     -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt"        \
+     -DLLVM_ENABLE_RUNTIMES="libc;openmp"                  \
+     -DCMAKE_BUILD_TYPE=<Debug|Release>   \ # Select build type
+     -DLIBC_GPU_BUILD=ON                  \ # Build in GPU mode
+     -DLLVM_LIBC_GPU_ARCHITECTURES=all    \ # Build all supported architectures
+     -DCMAKE_INSTALL_PREFIX=<PATH>        \ # Where 'libcgpu.a' will live
+  $> ninja install
+
+Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
+toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
+using a compatible compiler and to support ``openmp`` offloading, we list them
+in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
+newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
+directory in which to install the ``libcgpu.a`` library and headers along with
+LLVM. The generated headers will be placed in ``include/gpu-none-llvm``.
+
+Usage
+=====
+
+Once the ``libcgpu.a`` static archive has been built it can be linked directly
+with offloading applications as a standard library. This process is described in
+the `clang documentation <https://clang.llvm.org/docs/OffloadingDesign.html>`_.
+This linking mode is used by the OpenMP toolchain, but is currently opt-in for
+the CUDA and HIP toolchains through the ``--offload-new-driver``` and
+``-fgpu-rdc`` flags. A typical usage will look this this:
+
+.. code-block:: sh
+
+  $> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
+
+The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
+supported target device. The supported architectures can be seen using LLVM's
+``llvm-objdump`` with the ``--offloading`` flag:
+
+.. code-block:: sh
+
+  $> llvm-objdump --offloading libcgpu.a
+  libcgpu.a(strcmp.cpp.o):    file format elf64-x86-64
+
+  OFFLOADING IMAGE [0]:
+  kind            llvm ir
+  arch            gfx90a
+  triple          amdgcn-amd-amdhsa
+  producer        none
+
+Because the device code is stored inside a fat binary, it can be 
diff icult to
+inspect the resulting code. This can be done using the following utilities:
+
+.. code-block:: sh
+
+   $> llvm-ar x libcgpu.a strcmp.cpp.o
+   $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
+   $> opt -S out.bc
+   ...
+
+Please note that this fat binary format is provided for compatibility with
+existing offloading toolchains. The implementation in ``libc`` does not depend
+on any existing offloading languages and is completely freestanding.

diff  --git a/libc/docs/gpu_mode.rst b/libc/docs/gpu_mode.rst
deleted file mode 100644
index b71b6eec5daee..0000000000000
--- a/libc/docs/gpu_mode.rst
+++ /dev/null
@@ -1,169 +0,0 @@
-.. _GPU_mode:
-
-==============
-GPU Mode
-==============
-
-.. include:: check.rst
-
-.. contents:: Table of Contents
-  :depth: 4
-  :local:
-
-.. note:: This feature is very experimental and may change in the future.
-
-The *GPU* mode of LLVM's libc is an experimental mode used to support calling
-libc routines during GPU execution. The goal of this project is to provide
-access to the standard C library on systems running accelerators. To begin using
-this library, build and install the ``libcgpu.a`` static archive following the
-instructions in :ref:`building_gpu_mode` and link with your offloading
-application.
-
-.. _building_gpu_mode:
-
-Building the GPU library
-========================
-
-LLVM's libc GPU support *must* be built using the same compiler as the final
-application to ensure relative LLVM bitcode compatibility. This can be done
-automatically using the ``LLVM_ENABLE_RUNTIMES=libc`` option. Furthermore,
-building for the GPU is only supported in :ref:`fullbuild_mode`. To enable the
-GPU build, set the target OS to ``gpu`` via ``LLVM_LIBC_TARGET_OS=gpu``. By
-default, ``libcgpu.a`` will be built using every supported GPU architecture. To
-restrict the number of architectures build, set ``LLVM_LIBC_GPU_ARCHITECTURES``
-to the list of desired architectures or use ``all``. A typical ``cmake``
-configuration will look like this:
-
-.. code-block:: sh
-
-  $> cd llvm-project  # The llvm-project checkout
-  $> mkdir build
-  $> cd build
-  $> cmake ../llvm -G Ninja                                \
-     -DLLVM_ENABLE_PROJECTS="clang;lld;compiler-rt"        \
-     -DLLVM_ENABLE_RUNTIMES="libc;openmp"                  \
-     -DCMAKE_BUILD_TYPE=<Debug|Release>  \ # Select build type
-     -DLLVM_LIBC_FULL_BUILD=ON           \ # We need the full libc
-     -DLIBC_GPU_BUILD=ON                 \ # Build in GPU mode
-     -DLLVM_LIBC_GPU_ARCHITECTURES=all   \ # Build all supported architectures
-     -DCMAKE_INSTALL_PREFIX=<PATH>       \ # Where 'libcgpu.a' will live
-  $> ninja install
-
-Since we want to include ``clang``, ``lld`` and ``compiler-rt`` in our
-toolchain, we list them in ``LLVM_ENABLE_PROJECTS``. To ensure ``libc`` is built
-using a compatible compiler and to support ``openmp`` offloading, we list them
-in ``LLVM_ENABLE_RUNTIMES`` to build them after the enabled projects using the
-newly built compiler. ``CMAKE_INSTALL_PREFIX`` specifies the installation
-directory in which to install the ``libcgpu.a`` library along with LLVM.
-
-Usage
-=====
-
-Once the ``libcgpu.a`` static archive has been built in
-:ref:`building_gpu_mode`, it can be linked directly with offloading applications
-as a standard library. This process is described in the `clang documentation
-<https://clang.llvm.org/docs/OffloadingDesign.html>_`. This linking mode is used
-by the OpenMP toolchain, but is currently opt-in for the CUDA and HIP toolchains
-using the ``--offload-new-driver``` and ``-fgpu-rdc`` flags. A typical usage
-will look this this:
-
-.. code-block:: sh
-
-  $> clang foo.c -fopenmp --offload-arch=gfx90a -lcgpu
-
-The ``libcgpu.a`` static archive is a fat-binary containing LLVM-IR for each
-supported target device. The supported architectures can be seen using LLVM's
-objdump with the ``--offloading`` flag:
-
-.. code-block:: sh
-
-  $> llvm-objdump --offloading libcgpu.a
-  libcgpu.a(strcmp.cpp.o):    file format elf64-x86-64
-
-  OFFLOADING IMAGE [0]:
-  kind            llvm ir
-  arch            gfx90a
-  triple          amdgcn-amd-amdhsa
-  producer        <none>
-
-Because the device code is stored inside a fat binary, it can be 
diff icult to
-inspect the resulting code. This can be done using the following utilities:
-
-.. code-block:: sh
-
-   $> llvm-ar x libcgpu.a strcmp.cpp.o
-   $> clang-offload-packager strcmp.cpp.o --image=arch=gfx90a,file=gfx90a.bc
-   $> opt -S out.bc
-   ...
-
-Supported Functions
-===================
-
-The following functions and headers are supported at least partially on the
-device. Currently, only basic device functions that do not require an operating
-system are supported on the device. Supporting functions like `malloc` using an
-RPC mechanism is a work-in-progress.
-
-ctype.h
--------
-
-=============  =========
-Function Name  Available
-=============  =========
-isalnum        |check|
-isalpha        |check|
-isascii        |check|
-isblank        |check|
-iscntrl        |check|
-isdigit        |check|
-isgraph        |check|
-islower        |check|
-isprint        |check|
-ispunct        |check|
-isspace        |check|
-isupper        |check|
-isxdigit       |check|
-toascii        |check|
-tolower        |check|
-toupper        |check|
-=============  =========
-
-string.h
---------
-
-=============   =========
-Function Name   Available
-=============   =========
-bcmp            |check|
-bzero           |check|
-memccpy         |check|
-memchr          |check|
-memcmp          |check|
-memcpy          |check|
-memmove         |check|
-mempcpy         |check|
-memrchr         |check|
-memset          |check|
-stpcpy          |check|
-stpncpy         |check|
-strcat          |check|
-strchr          |check|
-strcmp          |check|
-strcpy          |check|
-strcspn         |check|
-strlcat         |check|
-strlcpy         |check|
-strlen          |check|
-strncat         |check|
-strncmp         |check|
-strncpy         |check|
-strnlen         |check|
-strpbrk         |check|
-strrchr         |check|
-strspn          |check|
-strstr          |check|
-strtok          |check|
-strtok_r        |check|
-strdup
-strndup
-=============   =========

diff  --git a/libc/docs/index.rst b/libc/docs/index.rst
index 90422617403e6..5e9a602b5a96a 100644
--- a/libc/docs/index.rst
+++ b/libc/docs/index.rst
@@ -52,7 +52,7 @@ stages there is no ABI stability in any form.
    usage_modes
    overlay_mode
    fullbuild_mode
-   gpu_mode
+   gpu/index.rst
 
 .. toctree::
    :hidden: