[libc-commits] [libc] [libc] [gpu] Add Generic and NvSin Throughput Benchmark (PR #101917)
Joseph Huber via libc-commits
libc-commits at lists.llvm.org
Sun Aug 4 19:05:16 PDT 2024
https://github.com/jhuber6 commented:
I'm wondering if we shouldn't have separate functions for throughput and latency. We likely want to keep the old assembly constraints for the latency checks, but can use something different if we put it in an array.
Also, @lntue, is it necessary to even use a loop? If we want strict throughput couldn't we just do something like
```
#pragma unroll
for (int i = 0; i < DEPTH; ++i) {
auto x = fn(input);
asm("" : "r"(input) ::); // Probably need to trick the compiler into thinking this changed.
}
```
https://github.com/llvm/llvm-project/pull/101917
More information about the libc-commits
mailing list