[libc-commits] [PATCH] D149398: [libc] Add support for global ctors / dtors for AMDGPU

Joseph Huber via Phabricator via libc-commits libc-commits at lists.llvm.org
Sat Apr 29 07:14:35 PDT 2023


jhuber6 added inline comments.


================
Comment at: libc/startup/gpu/amdgpu/start.cpp:67
+  // initialization code. This will get very, very slow for high thread counts,
+  // but for testing purposes it is unlikely to matter.
+  while (count.load(cpp::MemoryOrder::RELAXED) != get_grid_size())
----------------
jhuber6 wrote:
> sivachandra wrote:
> > Can this be avoided at all? As in, if there are globals that have to be initialized on the GPU, then all threads have to wait until they can start using those globals?
> Generally I just assume it's unsafe to have any GPU threads calling `main` before we've run all the global constructors. We could reduce this to a regular sync if we placed every global object in thread shared memory however. Then this would be a simple `gpu::sync_threads`. But that would require modifying the source to put `[[clang::addressspace(3)]]` around everything, which is a pretty scarce resource.
Another solution here is to have a separate kernel that we call to do the initialization and then we call `main`. Generally it thwarts a few optimizations to have global state shared between kernel calls, but I don't think we care about that here. I'll make a patch to do that instead sometime in the future rather than having a weird global barrier the hardware doesn't support. That should allow us to run tests on a fully saturated GPU.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149398/new/

https://reviews.llvm.org/D149398



More information about the libc-commits mailing list