[libc-commits] [libc] [libc] Polish GPU benchmarking (PR #153900)

Fri Aug 15 16:43:24 PDT 2025

================
@@ -66,7 +64,7 @@ template <typename F, typename T>
   uint64_t stop = gpu::processor_clock();
   cpp::atomic_thread_fence(cpp::MemoryOrder::ACQ_REL);
   asm("" ::"r"(stop));
-  volatile T output = result;
+  volatile auto output = result;
----------------
leandrolcampos wrote:

The reason I had to touch the NVPTX version (and not the AMDGPU one) is simply that only the NVPTX `latency()` had this instruction at the end:

```h
volatile T output = result;
```

In the ctype benches, we instantiate `latency<int (*)(int), char>`. The function returns `int`, but the template parameter `T` (the input type) is `char`. That line therefore assigns an `int` to a `volatile char`, which produces:

```bash
[4/15] Building CXX object libc/benchmarks/gpu/src/ctype/CMakeFiles/libc.benchmarks.gpu.src.ctype.isalnum_benchmark.__build__.dir/isalnum_benchmark.cpp.o
In file included from /home/leandro/llvm-project/libc/benchmarks/gpu/src/ctype/isalnum_benchmark.cpp:1:
In file included from /home/leandro/llvm-project/libc/benchmarks/gpu/LibcGpuBenchmark.h:4:
In file included from /home/leandro/llvm-project/libc/benchmarks/gpu/timing/timing.h:17:
/home/leandro/llvm-project/libc/benchmarks/gpu/timing/nvptx/timing.h:67:23: warning: implicit conversion loses integer precision: 'int' to 'volatile char' [-Wimplicit-int-conversion]
   67 |   volatile T output = result;
      |              ~~~~~~   ^~~~~~
/home/leandro/llvm-project/libc/benchmarks/gpu/src/ctype/isalnum_benchmark.cpp:7:26: note: in instantiation of function template specialization '__llvm_libc_22_0_0_git::latency<int (*)(int), char>' requested here
    7 |   return LIBC_NAMESPACE::latency(LIBC_NAMESPACE::isalnum, x);
      |                          ^
1 warning generated.
[6/15] Building CXX object libc/benchmarks/gpu/src/ctype/CMakeFiles/libc.benchmarks.gpu.src.ctype.isalpha_benchmark.__build__.dir/isalpha_benchmark.cpp.o
In file included from /home/leandro/llvm-project/libc/benchmarks/gpu/src/ctype/isalpha_benchmark.cpp:1:
In file included from /home/leandro/llvm-project/libc/benchmarks/gpu/LibcGpuBenchmark.h:4:
In file included from /home/leandro/llvm-project/libc/benchmarks/gpu/timing/timing.h:17:
/home/leandro/llvm-project/libc/benchmarks/gpu/timing/nvptx/timing.h:67:23: warning: implicit conversion loses integer precision: 'int' to 'volatile char' [-Wimplicit-int-conversion]
   67 |   volatile T output = result;
      |              ~~~~~~   ^~~~~~
/home/leandro/llvm-project/libc/benchmarks/gpu/src/ctype/isalpha_benchmark.cpp:7:26: note: in instantiation of function template specialization '__llvm_libc_22_0_0_git::latency<int (*)(int), char>' requested here
    7 |   return LIBC_NAMESPACE::latency(LIBC_NAMESPACE::isalpha, x);
      |                          ^
1 warning generated.
```

On AMDGPU, there is no such instruction in `latency()` (it seems to rely on the `v_or_b32` asm to use the value), and it also has a small-type special-case for the asm operand (char/bool), so there’s no narrowing assignment there.

I didn’t dive deeper here because `latency()` isn’t used by the math benchmarks; I only wanted to make the ctype benches warning-free.

https://github.com/llvm/llvm-project/pull/153900