[PATCH] D124189: [CUDA][HIP] Externalize kernels with internal linkage

Fri Apr 22 10:56:55 PDT 2022

tra accepted this revision.
tra added a comment.
This revision is now accepted and ready to land.

LGTM overal, with few test nits.

================
Comment at: clang/lib/AST/ASTContext.cpp:12300
+         (D->hasAttr<CUDAGlobalAttr>() &&
+          GetGVALinkageForFunction(cast<FunctionDecl>(D),
+                                   /*IgnoreCUDAGlobalAttr=*/true) ==
----------------
yaxunl wrote:
> tra wrote:
> > yaxunl wrote:
> > > tra wrote:
> > > > yaxunl wrote:
> > > > > tra wrote:
> > > > > > Perhaps we don't need to change the public AST API and plumb `IgnoreCUDAGlobalAttr` through.
> > > > > > We cold create CUDA-aware static version of `GetGVALinkageForCudaKernel` instead, which would call `adjustGVALinkageForExternalDefinitionKind(..., adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true))`.
> > > > > We could have a static function but it would be GetGVALinkageForCUDAKernelWithoutGlobalAttr since we need to know the linkage of the kernel assuming it has no `__global__` attribute.
> > > > > 
> > > > > If you think it is OK I can make the change.
> > > > No point making public what's of no use to anybody other than this particular instance.
> > > > 
> > > > To think of it, we don't even need a function and could just do 
> > > > ```
> > > >   if (D->hasAttr<CUDAGlobalAttr>() ) {
> > > >     bool OriginalKernelLinkage = adjustGVALinkageForExternalDefinitionKind(..., adjustGVALinkageForAttributes(IgnoreCUDAGlobalAttr=true));
> > > >     return OriginalKernelLinkage == GVA_Internal;
> > > >   }
> > > >   return (IsStaticVar &&....)
> > > > ```
> > > > 
> > > > 
> > > One disadvantage of this approach is that it duplicates the code in GetGVALinkageForFunction. In the future, if GetGVALinkageForFunction changes, the same change needs to be applied to the duplicated code, which is error-prone.
> > Good point.  Looking at the code closer, it appears that what we're interested in is whether the kernel was internal and *became* externally visible due to it being a kernel. 
> > 
> > Right now we're checking if the function would normally be `GVA_Internal` (shouldn't we have considered GVA_DiscardableODR, too, BTW?)
> > This is a somewhat indirect way of figuring out what we really need.
> > 
> > The code that determines what we want is essentially this code in adjustGVALinkageForAttributes that we're trying to enable/disable with `ConsiderCudaGlobalAttr`. 
> > 
> > It can be easily extracted into a static function, which could then be used from both `adjustGVALinkageForAttributes`, (which would no longer need `ConsiderCudaGlobalAttr`) and from here. 
> > 
> > ```
> > bool isInternalKernel(ASTContext *Context, Decl *D) {
> >   L=basicGVALinkageForFunction(Context, D);
> >   return (D->hasAttr<CUDAGlobalAttr>() &&
> >           (L == GVA_DiscardableODR || L == GVA_Internal));
> > }
> > ```
> > 
> > This would both avoid logic duplication and would better match our intent.
> > 
> > Does it make sense? Or did I miss something else?
> GVA_DiscardableODR  usually maps to linkonce_odr linkage in LLVM IR. It follows the ODR, therefore we should not make them unique.
> 
> If we use isInternalKernel in adjustGVALinkageForAttributes, there will be two calls of basicGVALinkageForFunction when GetGVALinkageForFunction is called, which seems inefficient. I think we can keep GetGVALinkageForFunction as it was, and use basicGVALinkageForFunction directly in mayExternalize.
SGTM.

================
Comment at: clang/test/CodeGenCUDA/device-var-linkage.cu:1-2
 // RUN: %clang_cc1 -no-opaque-pointers -triple nvptx -fcuda-is-device \
 // RUN:   -emit-llvm -o - -x hip %s \
 // RUN:   | FileCheck -check-prefixes=DEV,NORDC %s
----------------
This is odd -- the tests use `-x hip` and  `-triple nvptx`.

I think we need to change them  into HIP+amdgpu and CUDA +nvptx variants ans we now have language-dependent behavior here and are interested in the language/triple combinations that we do use in practice.

================
Comment at: clang/test/CodeGenCUDA/kernel-in-anon-ns.cu:3
 // RUN:   -aux-triple x86_64-unknown-linux-gnu -std=c++11 -fgpu-rdc \
 // RUN:   -emit-llvm -o - -x hip %s > %t.dev

----------------
We should have CUDA test variants here, too.

================
Comment at: clang/test/CodeGenCUDA/managed-var.cu:1
 // REQUIRES: x86-registered-target, amdgpu-registered-target

----------------
Tests above do not have REQUIRED. Is it needed here?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124189/new/

https://reviews.llvm.org/D124189