[libc-commits] [PATCH] D139839: [libc] Add a loader utility for AMDHSA architectures for testing

Mon Dec 12 06:38:47 PST 2022

JonChesterfield added a comment.

The file called start is traditionally called crt for c runtime.  Making to main across linked elf files probably works but it's a bit sketchy, would be inclined to compile it to IR instead.

The HSA boilerplate looks ok to me. A lot is common to the openmp plugin, if there's somewhere we can put a HSA.hpp that is reachable from libc and openmp I think that would be a win. Doesn't need associated object code, just the template wrappers.

================
Comment at: libc/utils/DeviceLoader/Main.cpp:26
+  // TODO: We should perform some validation on the file.
+  FILE *file = fopen(argv[1], "r");
+
----------------
If mmap is available it's probably a clearer than fopen here, but using only C is also appealing.

================
Comment at: libc/utils/DeviceLoader/amdgpu/CMakeLists.txt:37
+  -nostdinc
+  -mcpu=${AMDGPU_ARCH}
+)
----------------
I think this needs to be emit-llvm for amdgpu

================
Comment at: libc/utils/DeviceLoader/amdgpu/Loader.cpp:58
+
+/// Generic interface for iterating using the HSA callbacks.
+template <typename elem_ty, typename func_ty, typename func_arg_ty,
----------------
This is all pretty familiar. Do we have anywhere reasonable to drop a HSA.hpp header that puts these interfaces over the C?

================
Comment at: libc/utils/DeviceLoader/amdgpu/Loader.cpp:239
+  for (auto &[info, value] : symbol_infos)
+    if (hsa_status_t err = hsa_executable_symbol_get_info(symbol, info, value))
+      handle_error(err);
----------------
This doesn't work for some of the HSA calls. Group segment size or kernarg segment size maybe? One of them unconditionally returns zero (the openmp plugin reads it via msgpack because of this). Or at least, it used to return zero, I haven't checked recently.

================
Comment at: libc/utils/DeviceLoader/amdgpu/Loader.cpp:291
+
+  // Allocate memory for the pointer array on the device and copy it from the
+  // host to the device.
----------------
Could probably leave argv in fine grain memory instead of setting it up on the GPU, would make this somewhat simpler.

================
Comment at: libc/utils/DeviceLoader/amdgpu/Start.cpp:13
+_start(int argc, char **argv, int *ret) {
+  __atomic_fetch_or(ret, main(argc, argv), __ATOMIC_RELAXED);
+}
----------------
Oh, ret has to be on the GPU if you want to fetch_or into it

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D139839/new/

https://reviews.llvm.org/D139839