[clang] [Clang][OpenCL][AMDGPU] Allow a kernel to call another kernel (PR #115821)
Aniket Lal via cfe-commits
cfe-commits at lists.llvm.org
Wed Mar 12 02:01:56 PDT 2025
================
@@ -1582,6 +1582,26 @@ void CodeGenFunction::GenerateCode(GlobalDecl GD, llvm::Function *Fn,
// Implicit copy-assignment gets the same special treatment as implicit
// copy-constructors.
emitImplicitAssignmentOperatorBody(Args);
+ } else if (FD->hasAttr<OpenCLKernelAttr>() &&
+ GD.getKernelReferenceKind() == KernelReferenceKind::Kernel) {
+ CallArgList CallArgs;
+ for (unsigned i = 0; i < Args.size(); ++i) {
+ Address ArgAddr = GetAddrOfLocalVar(Args[i]);
+ QualType ArgQualType = Args[i]->getType();
+ RValue ArgRValue = convertTempToRValue(ArgAddr, ArgQualType, Loc);
----------------
lalaniket8 wrote:
> My only concern is struct type argument passed by value, currently, they seem to be handled specially
>
> https://godbolt.org/z/bo9aveqn3
>
> I am not sure your current approach will work for that, although it may. I think you may want to add a lit test for that.
I checked few cases where struct arguments are passed by value, the call to device stub is emitted as expected (similar to the godbolt example).
Littest [addr-space-struct-arg.cl](https://github.com/llvm/llvm-project/blob/d1e9e94a00b39de55899e6d768d35a00a5612929/clang/test/CodeGenOpenCL/addr-space-struct-arg.cl#L1266) covers struct arguments passed by value. Although we don't see the callsite in this littest since it gets inlined away.
We can see the callsite by disabling inlining, it looks something like this:
```
define dso_local amdgpu_kernel void @KernelTwoMember(%struct.StructTwoMember %u.coerce){
entry:
%u = alloca %struct.StructTwoMember, align 8, addrspace(5)
%u1 = addrspacecast ptr addrspace(5) %u to ptr
%0 = getelementptr inbounds nuw %struct.StructTwoMember, ptr %u1, i32 0, i32 0
%1 = extractvalue %struct.StructTwoMember %u.coerce, 0
store <2 x i32> %1, ptr %0, align 8
%2 = getelementptr inbounds nuw %struct.StructTwoMember, ptr %u1, i32 0, i32 1
%3 = extractvalue %struct.StructTwoMember %u.coerce, 1
store <2 x i32> %3, ptr %2, align 8
%4 = getelementptr inbounds nuw %struct.StructTwoMember, ptr %u1, i32 0, i32 0
%5 = load <2 x i32>, ptr %4, align 8
%6 = getelementptr inbounds nuw %struct.StructTwoMember, ptr %u1, i32 0, i32 1
%7 = load <2 x i32>, ptr %6, align 8
call void @__clang_ocl_kern_imp_KernelTwoMember(<2 x i32> %5, <2 x i32> %7) #4
ret void
}
```
> Did your PR pass internal CI?
Yes, we pass all tests in internal CI. [Link](https://github.com/AMD-Lightning-Internal/llvm-project/pull/580)
https://github.com/llvm/llvm-project/pull/115821
More information about the cfe-commits
mailing list