[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

Thu Apr 21 15:41:01 PDT 2022

tra added inline comments.

================
Comment at: clang/lib/AST/ASTContext.cpp:12300
+         (D->hasAttr<CUDAGlobalAttr>() &&
+          GetGVALinkageForFunction(cast<FunctionDecl>(D),
+                                   /*IgnoreCUDAGlobalAttr=*/true) ==
----------------
yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > tra wrote:
> > > > Perhaps we don't need to change the public AST API and plumb `IgnoreCUDAGlobalAttr` through.
> > > > We cold create CUDA-aware static version of `GetGVALinkageForCudaKernel` instead, which would call `adjustGVALinkageForExternalDefinitionKind(..., adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true))`.
> > > We could have a static function but it would be GetGVALinkageForCUDAKernelWithoutGlobalAttr since we need to know the linkage of the kernel assuming it has no `__global__` attribute.
> > > 
> > > If you think it is OK I can make the change.
> > No point making public what's of no use to anybody other than this particular instance.
> > 
> > To think of it, we don't even need a function and could just do 
> > ```
> >   if (D->hasAttr<CUDAGlobalAttr>() ) {
> >     bool OriginalKernelLinkage = adjustGVALinkageForExternalDefinitionKind(..., adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true));
> >     return OriginalKernelLinkage == GVA_Internal;
> >   }
> >   return (IsStaticVar &&....)
> > ```
> > 
> > 
> One disadvantage of this approach is that it duplicates the code in GetGVALinkageForFunction. In the future, if GetGVALinkageForFunction changes, the same change needs to be applied to the duplicated code, which is error-prone.
Good point.  Looking at the code closer, it appears that what we're interested in is whether the kernel was internal and *became* externally visible due to it being a kernel. 

Right now we're checking if the function would normally be `GVA_Internal` (shouldn't we have considered GVA_DiscardableODR, too, BTW?)
This is a somewhat indirect way of figuring out what we really need.

The code that determines what we want is essentially this code in adjustGVALinkageForAttributes that we're trying to enable/disable with `ConsiderCudaGlobalAttr`. 

It can be easily extracted into a static function, which could then be used from both `adjustGVALinkageForAttributes`, (which would no longer need `ConsiderCudaGlobalAttr`) and from here. 

```
bool isInternalKernel(ASTContext *Context, Decl *D) {
  L=basicGVALinkageForFunction(Context, D);
  return (D->hasAttr<CUDAGlobalAttr>() &&
          (L == GVA_DiscardableODR || L == GVA_Internal));
}
```

This would both avoid logic duplication and would better match our intent.

Does it make sense? Or did I miss something else?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124189/new/

https://reviews.llvm.org/D124189