[all-commits] [llvm/llvm-project] 507edb: [libc] Enable multiple threads to use RPC on the GPU
Joseph Huber via All-commits
all-commits at lists.llvm.org
Thu May 4 17:31:58 PDT 2023
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 507edb52f9a9a5c1ab2a92ec2e291a7b63c3fbff
https://github.com/llvm/llvm-project/commit/507edb52f9a9a5c1ab2a92ec2e291a7b63c3fbff
Author: Joseph Huber <jhuber6 at vols.utk.edu>
Date: 2023-05-04 (Thu, 04 May 2023)
Changed paths:
M libc/src/__support/RPC/CMakeLists.txt
M libc/src/__support/RPC/rpc.h
M libc/src/__support/RPC/rpc_util.h
M libc/startup/gpu/amdgpu/start.cpp
M libc/startup/gpu/nvptx/start.cpp
M libc/test/integration/startup/gpu/CMakeLists.txt
M libc/test/integration/startup/gpu/rpc_test.cpp
M libc/utils/gpu/loader/Loader.h
M libc/utils/gpu/loader/Server.h
M libc/utils/gpu/loader/amdgpu/Loader.cpp
M libc/utils/gpu/loader/nvptx/Loader.cpp
Log Message:
-----------
[libc] Enable multiple threads to use RPC on the GPU
The execution model of the GPU expects that groups of threads will
execute in lock-step in SIMD fashion. It's both important for
performance and correctness that we treat this as the smallest possible
granularity for an RPC operation. Thus, we map multiple threads to a
single larger buffer and ship that across the wire.
This patch makes the necessary changes to support executing the RPC on
the GPU with multiple threads. This requires some workarounds to mimic
the model when handling the protocol from the CPU. I'm not completely
happy with some of the workarounds required, but I think it should work.
Uses some of the implementation details from D148191.
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D148943
Commit: 901266dad313c114e12c181651249e30e5902e26
https://github.com/llvm/llvm-project/commit/901266dad313c114e12c181651249e30e5902e26
Author: Joseph Huber <jhuber6 at vols.utk.edu>
Date: 2023-05-04 (Thu, 04 May 2023)
Changed paths:
M libc/startup/gpu/amdgpu/start.cpp
M libc/startup/gpu/nvptx/start.cpp
M libc/utils/gpu/loader/Loader.h
M libc/utils/gpu/loader/amdgpu/Loader.cpp
M libc/utils/gpu/loader/nvptx/Loader.cpp
Log Message:
-----------
[libc] Change GPU startup and loader to use multiple kernels
The GPU has a different execution model to standard `_start`
implementations. On the GPU, all threads are active at the start of a
kernel. In order to correctly intitialize and call the constructors we
want single threaded semantics. Previously, this was done using a
makeshift global barrier with atomics. However, it should be easier to
simply put the portions of the code that must be single threaded in
separate kernels and then call those with only one thread. Generally,
mixing global state between kernel launches makes optimizations more
difficult, similarly to calling a function outside of the TU, but for
testing it is better to be correct.
Depends on D149527 D148943
Reviewed By: JonChesterfield
Differential Revision: https://reviews.llvm.org/D149581
Compare: https://github.com/llvm/llvm-project/compare/fe9f557578a5...901266dad313
More information about the All-commits
mailing list