[libc-commits] [libc] [libc] Add Timing Utils for AMDGPU (PR #96828)
Joseph Huber via libc-commits
libc-commits at lists.llvm.org
Wed Jun 26 20:38:41 PDT 2024
================
@@ -0,0 +1,73 @@
+//===------------- AMDGPU implementation of timing utils --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU
+#define LLVM_LIBC_UTILS_GPU_TIMING_AMDGPU
+
+#include "src/__support/GPU/utils.h"
+#include "src/__support/common.h"
+#include "src/__support/macros/attributes.h"
+#include "src/__support/macros/config.h"
+
+#include <stdint.h>
+
+namespace LIBC_NAMESPACE {
+
+// Returns the overhead associated with calling the profiling region. This
+// allows us to substract the constant-time overhead from the latency to
+// obtain a true result. This can vary with system load.
+[[gnu::noinline]] static LIBC_INLINE uint64_t overhead() {
+ gpu::memory_fence();
+ uint64_t start = gpu::processor_clock();
+ uint32_t result = 0.0;
+ asm("v_or_b32 %[v_reg], 0, %[v_reg]\n" ::[v_reg] "v"(result) :);
+ asm("" ::"s"(start));
+ uint64_t stop = gpu::processor_clock();
+ return stop - start;
+}
+
+// Profile a simple function and obtain its latency in clock cycles on the
+// system. This function cannot be inlined or else it will disturb the very
+// delicate balance of hard-coded dependencies.
+template <typename F, typename T>
+[[gnu::noinline]] static LIBC_INLINE uint64_t latency(F f, T t) {
+ // We need to store the input somewhere to guarantee that the compiler will
+ // not constant propagate it and remove the profiling region.
+ volatile uint32_t storage = t;
+ float arg = storage;
----------------
jhuber6 wrote:
For reference, AMDGCN has separate types of registers while PTX just lists them all as the same. Generally, there's `sgpr`, `vgpr` mainly. There's `agpr` but those are special registers mostly used for the high end cards so I wouldn't worry about them. SGPRs are scalar, they hold values such that they are uniform throughout the wavefront. VGPRs hold values for each lane in the wavefront. If you're familiar with SIMD in your x64 CPU, it's the same concept.
https://github.com/llvm/llvm-project/pull/96828
More information about the libc-commits
mailing list