[all-commits] [llvm/llvm-project] 507edb: [libc] Enable multiple threads to use RPC on the GPU

Thu May 4 17:31:58 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 507edb52f9a9a5c1ab2a92ec2e291a7b63c3fbff
      https://github.com/llvm/llvm-project/commit/507edb52f9a9a5c1ab2a92ec2e291a7b63c3fbff
  Author: Joseph Huber <jhuber6 at vols.utk.edu>
  Date:   2023-05-04 (Thu, 04 May 2023)

  Changed paths:
    M libc/src/__support/RPC/CMakeLists.txt
    M libc/src/__support/RPC/rpc.h
    M libc/src/__support/RPC/rpc_util.h
    M libc/startup/gpu/amdgpu/start.cpp
    M libc/startup/gpu/nvptx/start.cpp
    M libc/test/integration/startup/gpu/CMakeLists.txt
    M libc/test/integration/startup/gpu/rpc_test.cpp
    M libc/utils/gpu/loader/Loader.h
    M libc/utils/gpu/loader/Server.h
    M libc/utils/gpu/loader/amdgpu/Loader.cpp
    M libc/utils/gpu/loader/nvptx/Loader.cpp

  Log Message:
  -----------
  [libc] Enable multiple threads to use RPC on the GPU

The execution model of the GPU expects that groups of threads will
execute in lock-step in SIMD fashion. It's both important for
performance and correctness that we treat this as the smallest possible
granularity for an RPC operation. Thus, we map multiple threads to a
single larger buffer and ship that across the wire.

This patch makes the necessary changes to support executing the RPC on
the GPU with multiple threads. This requires some workarounds to mimic
the model when handling the protocol from the CPU. I'm not completely
happy with some of the workarounds required, but I think it should work.

Uses some of the implementation details from D148191.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D148943

  Commit: 901266dad313c114e12c181651249e30e5902e26
      https://github.com/llvm/llvm-project/commit/901266dad313c114e12c181651249e30e5902e26
  Author: Joseph Huber <jhuber6 at vols.utk.edu>
  Date:   2023-05-04 (Thu, 04 May 2023)

  Changed paths:
    M libc/startup/gpu/amdgpu/start.cpp
    M libc/startup/gpu/nvptx/start.cpp
    M libc/utils/gpu/loader/Loader.h
    M libc/utils/gpu/loader/amdgpu/Loader.cpp
    M libc/utils/gpu/loader/nvptx/Loader.cpp

  Log Message:
  -----------
  [libc] Change GPU startup and loader to use multiple kernels

The GPU has a different execution model to standard `_start`
implementations. On the GPU, all threads are active at the start of a
kernel. In order to correctly intitialize and call the constructors we
want single threaded semantics. Previously, this was done using a
makeshift global barrier with atomics. However, it should be easier to
simply put the portions of the code that must be single threaded in
separate kernels and then call those with only one thread. Generally,
mixing global state between kernel launches makes optimizations more
difficult, similarly to calling a function outside of the TU, but for
testing it is better to be correct.

Depends on D149527 D148943

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D149581

Compare: https://github.com/llvm/llvm-project/compare/fe9f557578a5...901266dad313