[libc-commits] [PATCH] D154446: [libc] Support timing information in libc tests

Wed Jul 5 11:24:01 PDT 2023

JonChesterfield added a comment.

Let's take an educated guess that this won't be the one true unique global we end up doing this with and make it a struct up front

================
Comment at: libc/startup/gpu/amdgpu/start.cpp:21
+// only be obtained via the driver. The loader will set this so we can use it.
+extern "C" [[gnu::visibility("protected")]] uint64_t __llvm_libc_clock_freq = 0;
+
----------------
This will be in .bss. Is that a good thing? Means we have no option to initialise it before loading the image, i.e. we're stuck with some variant of crossing the bus and waiting to write four bytes

================
Comment at: libc/test/UnitTest/LibcTest.cpp:22
+#if LIBC_TARGET_ARCH_IS_NVPTX
+#define CLOCKS_PER_SEC 1000000000
+#else
----------------
Looks sketchy that this is a different type on the various platforms. Safer would be a uint64_t, or at least annotate the type so it isn't a signed integer.

Really not keen on it being a macro pretending to be part of time.h when it isn't in time.h and this is all tied up in testing the implementation of libc.

================
Comment at: libc/test/UnitTest/LibcTest.cpp:154
       } else {
         const auto duration = end_time - start_time;
         const uint64_t duration_ms = duration * 1000 / CLOCKS_PER_SEC;
----------------
Please parenthesize to indicate what order of operations you want on the multiplication and division here, and if you don't want to give CLOCKS_PER_TYPE an actual type, static cast it to uint64_t at the use site so signed/unsigned integer nonsense can't bite us.

When this is a macro expanding to a non-const global, are we stuck with a separate load for each instance? I _think_ clang will constant fold it anyway because it isn't volatile or atomic but haven't checked.

================
Comment at: libc/utils/gpu/loader/amdgpu/Loader.cpp:272
+                               uint64_t size) {
+  // Create a memory signal to copy information between the host and device.
+  hsa_signal_t memory_signal;
----------------
Boilerplate looks ok here, though I'd probably have gone with the synchronous copy instead (no need to mess around with signals then, though I have a vague recollection of there being other hazards with the non-async one)

================
Comment at: libc/utils/gpu/loader/amdgpu/Loader.cpp:442
+    handle_error(err);
+
   // Obtain a queue with the minimum (power of two) size, used to send commands
----------------
All this stuff? Pretty much why I prefer writing to the global before copying it to the GPU

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154446/new/

https://reviews.llvm.org/D154446