[PATCH] D85276: [PGO][CUDA][HIP] Skip generating profile on the device stub and wrong-side functions.

Artem Belevich via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Aug 6 10:53:46 PDT 2020


tra added inline comments.


================
Comment at: clang/lib/CodeGen/CodeGenPGO.cpp:839-840
 
+  // Skip host-only functions in the CUDA device compilation and device-only
+  // functions in the host compilation.
+  if (CGM.getLangOpts().CUDA &&
----------------
hliao wrote:
> tra wrote:
> > hliao wrote:
> > > tra wrote:
> > > > We will still have around some functions that may never be used on the host side (HD functions referenced from device code only).  I'm not sure if that's a problem for profiling, though. I wonder if we can somehow tie `skipRegionMappingForDecl` to whether we've actually codegen'ed the function. 
> > > Skipping wrong-side functions here just makes the report not confusing as these functions are not emitted at all and are supposed never running on the host/device side. If we still create the mapping for them, e.g., we may report they have 0 runs instead of reporting nothing (just like comments between function.) That looks a little bit confusing.
> > > It seems the current PGO adds everything for coverage mapping and late prune them based on checks here. Just try to follow that logic to skip wrong-side functions. If we need to revise the original logic and generate coverage mapping for emitted functions only, the change here is unnecessary.
> > I'd add a comment here that this 'filter' is just a rough best-effort approximation that still allows some effectively device-only Decls through.
> > The output should still be correct, even though the functions will never be used. Maybe add a TODO to deal with it if/when we know if the Decl was codegen'ed.
> > 
> Add that comment. But, I tend to not deal that "effectively" host-only/device-only ones as that should be developers' responsibility to handle them. The additional zero coverage mapping may be useful as well. If a function is really device-only but is attributed with HD, the 0 coverage may help developers correcting them.
It will be rather noisy in practice. A lot of code has either has been written for NVCC or has to compile with it. NVCC does not have target overloads, so sticking HD everywhere  is pretty much the only practical way to do it in complicated enough C++ code.  Anything that uses Eigen or Thrust will have tons of HD functions that are actually used only on one side. 



Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D85276/new/

https://reviews.llvm.org/D85276



More information about the cfe-commits mailing list