[libc-commits] [PATCH] D154446: [libc] Support timing information in libc tests

Wed Jul 5 11:35:07 PDT 2023

jhuber6 added inline comments.

================
Comment at: libc/startup/gpu/amdgpu/start.cpp:21
+// only be obtained via the driver. The loader will set this so we can use it.
+extern "C" [[gnu::visibility("protected")]] uint64_t __llvm_libc_clock_freq = 0;
+
----------------
JonChesterfield wrote:
> This will be in .bss. Is that a good thing? Means we have no option to initialise it before loading the image, i.e. we're stuck with some variant of crossing the bus and waiting to write four bytes
I should probably make this in constant memory.

================
Comment at: libc/test/UnitTest/LibcTest.cpp:22
+#if LIBC_TARGET_ARCH_IS_NVPTX
+#define CLOCKS_PER_SEC 1000000000
+#else
----------------
JonChesterfield wrote:
> Looks sketchy that this is a different type on the various platforms. Safer would be a uint64_t, or at least annotate the type so it isn't a signed integer.
> 
> Really not keen on it being a macro pretending to be part of time.h when it isn't in time.h and this is all tied up in testing the implementation of libc.
Honestly the whole `CLOCKS_PER_SEC` thing confuses me in general. POSIX says it's always 1,000,000, other platforms say it's something else. Other's scale the clock so that it matches 1,000,000... We could probably avoid it here if necessary and make it a constant or something.

================
Comment at: libc/test/UnitTest/LibcTest.cpp:154
       } else {
         const auto duration = end_time - start_time;
         const uint64_t duration_ms = duration * 1000 / CLOCKS_PER_SEC;
----------------
JonChesterfield wrote:
> Please parenthesize to indicate what order of operations you want on the multiplication and division here, and if you don't want to give CLOCKS_PER_TYPE an actual type, static cast it to uint64_t at the use site so signed/unsigned integer nonsense can't bite us.
> 
> When this is a macro expanding to a non-const global, are we stuck with a separate load for each instance? I _think_ clang will constant fold it anyway because it isn't volatile or atomic but haven't checked.
I could probably make it const since it should be set by the loader / whoever's standing up the runtime.

================
Comment at: libc/utils/gpu/loader/amdgpu/Loader.cpp:272
+                               uint64_t size) {
+  // Create a memory signal to copy information between the host and device.
+  hsa_signal_t memory_signal;
----------------
JonChesterfield wrote:
> Boilerplate looks ok here, though I'd probably have gone with the synchronous copy instead (no need to mess around with signals then, though I have a vague recollection of there being other hazards with the non-async one)
Yeah, didn't see any simpler ways to do this in the HSA interface.

================
Comment at: libc/utils/gpu/loader/amdgpu/Loader.cpp:442
+    handle_error(err);
+
   // Obtain a queue with the minimum (power of two) size, used to send commands
----------------
JonChesterfield wrote:
> All this stuff? Pretty much why I prefer writing to the global before copying it to the GPU
I didn't want to pass it via the kernel arguments because then it would require changing the NVPTX plugin as well. Plus in the future we'll probably want to abstract this into some general environment constant.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154446/new/

https://reviews.llvm.org/D154446