[PATCH] D125970: [amdgpu] Add amdgpu_kernel calling conv attribute to clang

Mon May 23 10:40:34 PDT 2022

aaron.ballman added a comment.

In D125970#3531673 <https://reviews.llvm.org/D125970#3531673>, @JonChesterfield wrote:

> In D125970#3531645 <https://reviews.llvm.org/D125970#3531645>, @aaron.ballman wrote:
>
>> In D125970#3527685 <https://reviews.llvm.org/D125970#3527685>, @JonChesterfield wrote:
>>
>>> If it was adding a calling convention, sure - caution warranted. There's no llvm change here though, an existing CC is exposed to C++. No change to the type system either.
>>
>> This is adding a user-facing calling convention to Clang and it changes the type system as a result. For example, lambda function pointer conversion operators sometimes are generated for each calling convention so that you can form a function pointer of the correct type (this might not be impacted by your change here); there's a specific number of bits for representing the enumeration of calling conventions and this uses one of those bits, etc.
>
> It slightly changes the type system of C++ code in that the calling convention was previously only available in opencl / openmp etc. I was under the impression that the compiler data representation cost of calling conventions was in LLVM and thus pre-paid for the calling convention this gives access to. There's the `enum CallingConv ` which has gained a field, I didn't realise that was input into something of limited bitwidth.

Calling conventions are weird in that they have a fair amount of frontend AND backend work involved with them (though maybe this one is more backend than frontend as it doesn't seem to be doing much different in codegen). As for the bit-width thing, it mostly comes into play here: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/AST/Type.h#L3678 -- we try to pack as much information into as few bits as possible because `Type` overhead causes big problems (for example, it limits template instantiation depth due to memory overhead). We have *some* wiggle room left in that bit-field, and we have some ideas on how to improve the situation, but... nobody's done the refactoring work yet and each new calling convention we add brings us that much closer to the answer being "sorry, can't do it.".

>>> I'll propose a patch with some documentation for it if you wish, but it'll just say "For ad hoc debugging of the amdgpu backend". Undocumented seems to state that more clearly.
>>
>> I continue to question whether we want to support such a calling convention. This does not seem to be generally useful enough to warrant inclusion in Clang. The fact that you'd like to leave it undocumented as that's more clear for users is a pretty good indication that this calling convention doesn't meet the bar for an extension.
>
> Strictly speaking this lets people write a GPU kernel that can execute on AMDGPU in freestanding C++.

That sounds generally useful, which is great!

> I happen to want to do that for testing LLVM in the immediate instance but there's arguably wider applicability. However, it looks like how arguments are represented in this calling convention has some strangeness (see discussion with Sam above), particularly with regard to address spaces.

That sounds less great. :-( This suggests there may be another calling convention in the future which corrects those deficiencies.

> I can revert this patch if necessary, but it'll force me to continue trying to test our compiler through the lens of opencl, and rules out programming the hardware without the various specific language front ends. I think that would be a sad loss.

My thinking is: I don't want you to revert and have no solution, because you have a problem to solve and this (presumably) solves it for you. But at the same time, I'd like us to be able to explore options to make sure that a calling convention is the best approach and that we're confident that we won't need additional calling conventions to fix corner cases in the future. For example, can the calling convention be inferred at codegen time in Clang or by an LLVM pass so that the FE doesn't need to expose an attribute through the type system?

Have you explored alternatives that don't require a user-facing attribute?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125970/new/

https://reviews.llvm.org/D125970