[libc-commits] [PATCH] D158320: [libc] Initial support for microbenchmarking GPU code
Artem Belevich via Phabricator via libc-commits
libc-commits at lists.llvm.org
Wed Sep 6 10:37:54 PDT 2023
tra added a comment.
LGTM for NVPTX side.
================
Comment at: libc/utils/gpu/timing/nvptx/timing.h:55-56
+ // Get the current timestamp from the clock.
+ gpu::sync_threads();
+ uint64_t start = gpu::processor_clock();
+
----------------
This arrangement still seems to be a bit fragile.
sync_threads will confine clock reading to happen between them, but withing that range they may still be moved around by LLVM or ptxas.
`asm volatile` will probably restrict that on IR level, but it would not do anything on ptxas level. We can hope that ptxas would not move sreg reads around much, but I don't think it's guaranteed.
This example happens to work, but I would not be surprised that we'll run into issues trying to bench more complicated code.
I'd wrap each clock read between sync_threads() to make sure that ptxas can't move those reads.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D158320/new/
https://reviews.llvm.org/D158320
More information about the libc-commits
mailing list