[libc-commits] [PATCH] D145913: [libc] Add initial support for an RPC mechanism for the GPU

Guillaume Chatelet via Phabricator via libc-commits libc-commits at lists.llvm.org
Sat Apr 1 05:04:34 PDT 2023


gchatelet added inline comments.


================
Comment at: libc/utils/gpu/loader/amdgpu/Loader.cpp:314
+  void *buffer;
+  if (hsa_status_t err = hsa_amd_memory_pool_allocate(
+          finegrained_pool, sizeof(__llvm_libc::cpp::Atomic<int>),
----------------
jhuber6 wrote:
> gchatelet wrote:
> > Are there any guarantees that this piece of memory has the same alignment that `__llvm_libc::cpp::Atomic<int>`.
> > Same for the Cuda loader.
> Fine-grained memory in this context is always going to be aligned to a page as far as I know. So that'll usually be aligned on a 4096 byte boundary.
Does that mean that you're reserving one page for just a few bytes? Maybe it would make more sense to reserve a larger chunk of memory and place the objects ourselves (not sure what this implies for the GPU).

Regarding the rpc mechanism, should the two atomics share the same cache line or would it make sense to have them in separate cache lines to prevent false sharing? It's probably OK to have them on the same cache line if there is only one client and one server.

Performance wise, it seems important that the atomic don't cross cache line boundaries though. And since placement is important (`alignof(cpp::atomic<T>)` should be honored for proper codegen) maybe the server should just allocate a sufficiently large chunk of memory and let a common function do the actual placement. This would prevent duplicate logic in each server (loader).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D145913/new/

https://reviews.llvm.org/D145913



More information about the libc-commits mailing list