[libc-commits] [PATCH] D154446: [libc] Support timing information in libc tests
Jon Chesterfield via Phabricator via libc-commits
libc-commits at lists.llvm.org
Wed Jul 5 11:24:01 PDT 2023
JonChesterfield added a comment.
Let's take an educated guess that this won't be the one true unique global we end up doing this with and make it a struct up front
================
Comment at: libc/startup/gpu/amdgpu/start.cpp:21
+// only be obtained via the driver. The loader will set this so we can use it.
+extern "C" [[gnu::visibility("protected")]] uint64_t __llvm_libc_clock_freq = 0;
+
----------------
This will be in .bss. Is that a good thing? Means we have no option to initialise it before loading the image, i.e. we're stuck with some variant of crossing the bus and waiting to write four bytes
================
Comment at: libc/test/UnitTest/LibcTest.cpp:22
+#if LIBC_TARGET_ARCH_IS_NVPTX
+#define CLOCKS_PER_SEC 1000000000
+#else
----------------
Looks sketchy that this is a different type on the various platforms. Safer would be a uint64_t, or at least annotate the type so it isn't a signed integer.
Really not keen on it being a macro pretending to be part of time.h when it isn't in time.h and this is all tied up in testing the implementation of libc.
================
Comment at: libc/test/UnitTest/LibcTest.cpp:154
} else {
const auto duration = end_time - start_time;
const uint64_t duration_ms = duration * 1000 / CLOCKS_PER_SEC;
----------------
Please parenthesize to indicate what order of operations you want on the multiplication and division here, and if you don't want to give CLOCKS_PER_TYPE an actual type, static cast it to uint64_t at the use site so signed/unsigned integer nonsense can't bite us.
When this is a macro expanding to a non-const global, are we stuck with a separate load for each instance? I _think_ clang will constant fold it anyway because it isn't volatile or atomic but haven't checked.
================
Comment at: libc/utils/gpu/loader/amdgpu/Loader.cpp:272
+ uint64_t size) {
+ // Create a memory signal to copy information between the host and device.
+ hsa_signal_t memory_signal;
----------------
Boilerplate looks ok here, though I'd probably have gone with the synchronous copy instead (no need to mess around with signals then, though I have a vague recollection of there being other hazards with the non-async one)
================
Comment at: libc/utils/gpu/loader/amdgpu/Loader.cpp:442
+ handle_error(err);
+
// Obtain a queue with the minimum (power of two) size, used to send commands
----------------
All this stuff? Pretty much why I prefer writing to the global before copying it to the GPU
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D154446/new/
https://reviews.llvm.org/D154446
More information about the libc-commits
mailing list