[PATCH] D118415: AMDGPU: Reserve v32 if we may need to copy between AGPRs on gfx908

Thu Jan 27 18:00:29 PST 2022

arsenm added a comment.

In D118415#3277958 <https://reviews.llvm.org/D118415#3277958>, @rampitec wrote:

> In D118415#3277923 <https://reviews.llvm.org/D118415#3277923>, @arsenm wrote:
>
>> In D118415#3277872 <https://reviews.llvm.org/D118415#3277872>, @rampitec wrote:
>>
>>> I think it has to be dynamic depending on the requested occupancy. But even then it can drop the occupancy of a kernel if it uses less than 32 registers, which not uncommon. I do not believe we can reserve it that high.
>>
>> Reserving in the function argument range is a problem. We also treat the requested occupancy as a hint, not something we're forced to follow
>
> Documentation (https://clang.llvm.org/docs/AttributeReference.html#amdgpu-waves-per-eu) is a bit contradictory:
>
>> An error will be given if:
>>
>> - Specified values violate subtarget specifications;
>> - Specified values are not compatible with values provided through other attributes;
>> - The AMDGPU target backend is unable to create machine code that can meet the request.

This was never implemented as an error and I remember the intent was to make this fuzzy so your code would not break if there was a change in subtarget behavior

>> This attribute may be attached to a kernel function definition and is an optimization hint.
>
> Anyway, inability to run kernels at maximum occupancy is a show stopper itself.

This is practically impossible if you are using mfma instructions anyway

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D118415/new/

https://reviews.llvm.org/D118415