[libc-commits] [libc] [libc] Update GPU testing documentation (PR #85459)

Fri Mar 15 15:33:15 PDT 2024

================
@@ -14,24 +14,132 @@ Testing the GPU libc library
   :depth: 4
   :local:
 
-Testing Infrastructure
+Testing infrastructure
 ======================
 
-The testing support in LLVM's libc implementation for GPUs is designed to mimic
-the standard unit tests as much as possible. We use the :ref:`libc_gpu_rpc`
-support to provide the necessary utilities like printing from the GPU. Execution
-is performed by emitting a ``_start`` kernel from the GPU
-that is then called by an external loader utility. This is an example of how
-this can be done manually:
+The LLVM C library supports different kinds of :ref:`tests <build_and_test>`
+depending on the build configuration. The GPU target is considered a full build
+and therefore provides all of its own utilities to build and run the generated
+tests. Currently the GPU supports two kinds of tests.
+
+#. **Hermetic tests** - These are unit tests built with a test suite similar to
+   Google's ``gtest`` infrastructure. These use the same infrastructure as unit
+   tests except that the entire environment is self-hosted. This allows us to
+   run them on the GPU using our custom utilities. These are used to test the
+   majority of functional implementations.
+
+#. **Integration tests** - These are lightweight tests that simply call a
+   ``main`` function and checks if it returns non-zero. These are primarily used
+   to test interfaces that are sensitive to threading.
+
+The GPU uses the same testing infrastructure as the other supported ``libc``
+targets. We do this by treating the GPU as a standard hosted environment capable
+of launching a ``main`` function. Effectively, this means building our own
+startup libraries and loader.
+
+Testing utilities
+=================
+
+We provide two utilities to execute arbitrary programs on the GPU. That is the
+``loader`` and the ``start`` object.
+
+Startup object
+--------------
+
+This object mimics the standard object used by existing C library
+implementations. Its job is to perform the necessary setup prior to calling the
+``main`` function. In the GPU case, this means exporting GPU kernels that will
+perform the necessary operations. Here we use ``_begin`` and ``_end`` to handle
+calling global constructors and destructors while ``_start`` begins the standard
+execution. The following code block shows the implementation for AMDGPU
+architectures.
+
+.. code-block:: c++
+
+  extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
+  _begin(int argc, char **argv, char **env) {
+    LIBC_NAMESPACE::atexit(&LIBC_NAMESPACE::call_fini_array_callbacks);
+    LIBC_NAMESPACE::call_init_array_callbacks(argc, argv, env);
+  }
+
+  extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
+  _start(int argc, char **argv, char **envp, int *ret) {
+    __atomic_fetch_or(ret, main(argc, argv, envp), __ATOMIC_RELAXED);
+  }
+
+  extern "C" [[gnu::visibility("protected"), clang::amdgpu_kernel]] void
+  _end(int retval) {
+    LIBC_NAMESPACE::exit(retval);
+  }
+
+Loader runtime
+--------------
+
+The startup object provides a GPU executable with callable kernels for the
+respective runtime. We can then define a minimal runtime that will launch these
+kernels on the given device. Currently we provide the ``amdhsa-loader`` and
+``nvptx-loader`` targeting the AMD HSA runtime and CUDA driver runtime
+respectively. By default these will launch with a single thread on the GPU.
 
 .. code-block:: sh
 
-   $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=gfx90a -flto
-   $> ./amdhsa_loader --threads 1 --blocks 1 a.out
+   $> clang++ crt1.o test.cpp --target=amdgcn-amd-amdhsa -mcpu=native -flto
+   $> amdhsa_loader --threads 1 --blocks 1 ./a.out
    Test Passed!
 
-Unlike the exported ``libcgpu.a``, the testing architecture can only support a
-single architecture at a time. This is either detected automatically, or set
-manually by the user using ``LIBC_GPU_TEST_ARCHITECTURE``. The latter is useful
-in cases where the user does not build LLVM's libc on machine with the GPU to
-use for testing.
+The loader utility will forward any arguments passed after the executable image
+to the program on the GPU as well as any set environment variables. The number
+of threads and blocks to be set can be controlled with ``--threads`` and
+``--blocks``. These also accept additional ``x``, ``y``, ``z`` variants for
+multidimensional grids.
+
+Running tests
+=============
+
+Tests will only be built and run if a GPU target architecture is set and the
+corresponding loader utility was built. These can be overridden with the
+``LIBC_GPU_TEST_ARCHITECTURE`` and ``LIBC_GPU_LOADER_EXECUTABLE`` :ref:`CMake
+options <gpu_cmake_options>`. Once built, they can be run like any other tests.
+The CMake target depends on how the library was built.
+
+#. **Cross build** - If the C library was built using ``LLVM_ENABLE_PROJECTS``
+   or a runtimes cross build, then the standard targets will be present in the
+   base CMake build directory.
+
+   #. All tests - You can run all supported tests with the command:
+
+      .. code-block:: sh
+
+        $> ninja check-libc
+
+   #. Hermetic tests - You can run hermetic with tests the command:
+
+      .. code-block:: sh
+
+        $> ninja libc-hermetic-tests
+
+   #. Integration tests - You can run integration tests by the command:
+
+      .. code-block:: sh
+
+        $> ninja libc-integration-tests
+
+#. **Runtimes build** - If the library was built using ``LLVM_ENABLE_RUNTIMES``
+   then the actual ``libc`` build will be in a separate directory.
+
+   #. All tests - You can run all supported tests with the command:
+
+      .. code-block:: sh
+
+        $> ninja check-libc-amdgcn-amd-amdhsa
+        $> ninja check-libc-nvptx64-nvidia-cuda
+
+   #. Specific tests - You can use the same targets as above by entering the
+      runtimes build directory.
+
+      .. code-block:: sh
+
+        $> ninja -C runtimes/runtimes-amdgcn-amd-amdhsa-bins check-libc
----------------
jhuber6 wrote:

I figured it was implied by 
> You can use the same targets as above by entering the runtimes build directory.
 But I could be more explicit and show both cases.

https://github.com/llvm/llvm-project/pull/85459