[PATCH] D146448: [CUDA] Update cached kernel handle when the function instance changes.

Mon Mar 20 14:12:23 PDT 2023

tra added inline comments.

================
Comment at: clang/lib/CodeGen/CGCUDANV.cpp:1201
+      return Loc->second;
+    // non-HIP compilation may end up with a different F and need to have
+    // handles and stubs updated.
----------------
dcastagna wrote:
> The different F is because sometimes at first we use an F with incomplete types as a key and later we use a different F (even if it's coming from the same function) with the instantiated types (usually after we encounter a statement where the size of the template parameter is necessary, like a function call).
> Another option could be to always instantiate the complete type for F when we call GetOrCreateLLVMFunction (IIRC) if the function has a __global__ attribute.
> 
> If you prefer this solution, the patch LGTM.
> 
> 
Instantiating a complete function when we normally would not sounds a bit too invasive and may buy us unintended side effects. E.g. I can see it potentially causing code explosion in some cases, where we previously could get by with just a function decl. 

Fixing the cache locally maintains the global status quo which seems to be about the right balance between improving the situation, the effort involved, and the potential risk of the fix.

The way we track kernels could potentially be improved (I think it would be better to track them explicitly vs relying on observing the functions flying by), but that should be a separate patch. Right now I just want to fix the regression w/o affecting anything else.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D146448/new/

https://reviews.llvm.org/D146448