[libc-commits] [PATCH] D158320: [libc] Initial support for microbenchmarking GPU code

Artem Belevich via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed Sep 6 10:37:54 PDT 2023


tra added a comment.

LGTM for NVPTX side.



================
Comment at: libc/utils/gpu/timing/nvptx/timing.h:55-56
+  // Get the current timestamp from the clock.
+  gpu::sync_threads();
+  uint64_t start = gpu::processor_clock();
+
----------------
This arrangement still seems to be a bit fragile.
sync_threads will confine clock reading to happen between them, but withing that range they may still be moved around by LLVM or ptxas.
`asm volatile` will probably restrict that on IR level, but it would not do anything on ptxas level. We can hope that ptxas would not move sreg reads around much, but I don't think it's guaranteed.

This example happens to work, but I would not be surprised that we'll run into issues trying to bench more complicated code.

I'd wrap each clock read between sync_threads() to make sure that ptxas can't move those reads.



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158320/new/

https://reviews.llvm.org/D158320



More information about the libc-commits mailing list