[llvm-dev] Dynamically determine the CostPerUse value in the register allocator.

Chen, Shiva via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 1 20:11:52 PDT 2020


Hi Devadasan,

I think extending getCostPerUse as a virtual function and with
MachineFunction parameter could be the first step to set up the cost. With
MachineFunction, it should able to get calling convention and subtarget
information.

Madhur Amilkanthwar via llvm-dev <llvm-dev at lists.llvm.org> 於 2020年5月30日 週六
下午8:53寫道:

> I dont know the history behind CostPerUse word so I may be missing the
> background associated with it. It seems that it's misnomer for what it is
> intended. At first sight, the word indicates that the cost is a function of
> uses of the register - more the uses more the cost. How do we want to
> define the value of CostPerUse. Should it be a function of uses? or just
> the target?
>
>
> On Sat, May 30, 2020, 4:53 PM Devadasan, Christudasan via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>>
>> Please ignore the header “AMD Official Use Only”. I forgot to remove it
>> while posting the email to llvm-dev.
>>
>>
>>
>> Regards,
>>
>> Christudasan
>>
>>
>>
>> *From:* Devadasan, Christudasan
>> *Sent:* Friday, May 29, 2020 7:46 PM
>> *To:* llvm-dev at lists.llvm.org
>> *Subject:* [llvm-dev] Dynamically determine the CostPerUse value in the
>> register allocator.
>>
>>
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>>
>> Hi All,
>>
>>
>>
>> For the AMDGPU architecture, during RA, we prefer to have a cost
>> associated with the registers (CostPerUse) based on a target entity (for
>> instance, the Calling Convention of the current MachineFunction).
>>
>> Presently CostPerUse is a one-time static value (either zero or a
>> positive value) generated through table-gen.
>>
>> The current implementation doesn’t allow us to control the reg-cost on
>> the fly.
>>
>>
>>
>> The AMDGPU ABI has recently been revised by introducing more caller-saved
>> VGPRs (the exact details are explained towards the end of this e-mail), and
>> found that having a dynamic register cost is important to achieve an
>> optical allocation.
>>
>> Precisely, it is important to limit the number of VGPRs allocated for a
>> kernel/device-function to a smallest value since it will have a direct
>> impact on the occupancy. The occupancy means the number of wavefronts that
>> can be launched at runtime for a kernel program.
>>
>>
>>
>> Some initial thoughts on how to fix it:
>>
>>    1. Have a target interface (a switch) to enable/discard the
>>    CostPerUse value.
>>    2. Get the register cost in the same way we define various calling
>>    conventions (*CallingConv.td).
>>    3. Compute the CostPerUse in the way the AllocationOrder for the
>>    registers is determined during RA.
>>
>>
>>
>> The first one is the easiest method and that solves the immediate problem
>> we currently address.
>>
>> However, the other two options are better if we want to associate
>> different reg-cost values for different calling conventions (I presume, it
>> will arise at some point).
>>
>> Other than these options, there can be a better way to fix it. Any
>> suggestion in this regard would be helpful.
>>
>>
>>
>> AMDGPU ABI changes and the motivation for this discussion:
>>
>>
>>
>> Before the new ABI change:
>>
>> Apart from the initial reserved 32 argument registers, all VGPRs are
>> callee-saved registers (VGPR32 - VGPR255).
>>
>> With the new ABI:
>>
>> We made VGPR32 - VGPR255 into equal number of callee-saved and
>> caller-saved registers.
>>
>> For the same occupancy reason, these two sets are interleaved at a split
>> boundary of 8.
>>
>> VGPR32-VGPR39 (Caller-saved)
>>
>> VGPR40-VGPR47 (Callee-saved)
>>
>> VGPR48-VGPR55 (Caller-saved)
>>
>>               -
>>
>>               -
>>
>> VGPR248-VGPR255 (Callee-saved)
>>
>>
>>
>> With the new ABI, the allocator’s preference for callee-saved vs
>> caller-saved depends on the input program.
>>
>> RA may end up allocating more caller-saved registers than the
>> callee-saved in certain cases. The other way of allocation is possible too
>> (more callee-saved registers)
>>
>> In either case, there will be unallocated registers left behind, bumping
>> up the final VGPRs into a considerable number. It will have a bad impact on
>> the occupancy.
>>
>> To override the default allocation preferences of RA, we tried to set a
>> cost for all VGPRs such that the higher indices will have higher cost.
>>
>> It eliminated the problem by allocating all lower registers before
>> picking the higher one, and with an expense of some spills in certain cases
>> which is acceptable.
>>
>>
>>
>> But for the kernels with no device-function calls, the register cost is
>> unnecessary. Because there is no ABI for such kernel programs.
>>
>> It caused a performance penalty for such kernels due to the register cost.
>>
>> That’s the exact reason we need a method to determine dynamically either
>> to have a reg-cost or not to have one.
>>
>>
>>
>> Regards,
>>
>> Christudasan
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200602/3b43c623/attachment.html>


More information about the llvm-dev mailing list