[libcxx-commits] [lldb] [libc] [flang] [openmp] [libcxx] [compiler-rt] [llvm] [clang-tools-extra] [clang] [lld] [PGO][OpenMP] Instrumentation for GPU devices (PR #76587)
Johannes Doerfert via libcxx-commits
libcxx-commits at lists.llvm.org
Fri Jan 5 10:45:06 PST 2024
jdoerfert wrote:
> > ongoing effort to extends PGO instrumentation to GPU device code
>
> Is there a high level description for this effort and its goal? Traditional compiler PGO is mostly for profiling control-flow, but we don't usually have a lot of control flow for GPU kernels.
I am unsure where this assumption comes from but it is not true in general.
HPC most certainly has lots of control flow in kernels. We also have lots of calls, indirect calls, loops of every size, and basically everything else you have in CPU code.
Thus, high level, we want to have PGO for offloaded code for the same reasons we want PGO for non-offloaded codes.
https://github.com/llvm/llvm-project/pull/76587
More information about the libcxx-commits
mailing list