[llvm-dev] Dynamically determine the CostPerUse value in the register allocator.

Madhur Amilkanthwar via llvm-dev llvm-dev at lists.llvm.org
Sat May 30 05:52:15 PDT 2020


I dont know the history behind CostPerUse word so I may be missing the
background associated with it. It seems that it's misnomer for what it is
intended. At first sight, the word indicates that the cost is a function of
uses of the register - more the uses more the cost. How do we want to
define the value of CostPerUse. Should it be a function of uses? or just
the target?


On Sat, May 30, 2020, 4:53 PM Devadasan, Christudasan via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> [AMD Official Use Only - Internal Distribution Only]
>
>
>
> Please ignore the header “AMD Official Use Only”. I forgot to remove it
> while posting the email to llvm-dev.
>
>
>
> Regards,
>
> Christudasan
>
>
>
> *From:* Devadasan, Christudasan
> *Sent:* Friday, May 29, 2020 7:46 PM
> *To:* llvm-dev at lists.llvm.org
> *Subject:* [llvm-dev] Dynamically determine the CostPerUse value in the
> register allocator.
>
>
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
>
> Hi All,
>
>
>
> For the AMDGPU architecture, during RA, we prefer to have a cost
> associated with the registers (CostPerUse) based on a target entity (for
> instance, the Calling Convention of the current MachineFunction).
>
> Presently CostPerUse is a one-time static value (either zero or a positive
> value) generated through table-gen.
>
> The current implementation doesn’t allow us to control the reg-cost on the
> fly.
>
>
>
> The AMDGPU ABI has recently been revised by introducing more caller-saved
> VGPRs (the exact details are explained towards the end of this e-mail), and
> found that having a dynamic register cost is important to achieve an
> optical allocation.
>
> Precisely, it is important to limit the number of VGPRs allocated for a
> kernel/device-function to a smallest value since it will have a direct
> impact on the occupancy. The occupancy means the number of wavefronts that
> can be launched at runtime for a kernel program.
>
>
>
> Some initial thoughts on how to fix it:
>
>    1. Have a target interface (a switch) to enable/discard the CostPerUse
>    value.
>    2. Get the register cost in the same way we define various calling
>    conventions (*CallingConv.td).
>    3. Compute the CostPerUse in the way the AllocationOrder for the
>    registers is determined during RA.
>
>
>
> The first one is the easiest method and that solves the immediate problem
> we currently address.
>
> However, the other two options are better if we want to associate
> different reg-cost values for different calling conventions (I presume, it
> will arise at some point).
>
> Other than these options, there can be a better way to fix it. Any
> suggestion in this regard would be helpful.
>
>
>
> AMDGPU ABI changes and the motivation for this discussion:
>
>
>
> Before the new ABI change:
>
> Apart from the initial reserved 32 argument registers, all VGPRs are
> callee-saved registers (VGPR32 - VGPR255).
>
> With the new ABI:
>
> We made VGPR32 - VGPR255 into equal number of callee-saved and
> caller-saved registers.
>
> For the same occupancy reason, these two sets are interleaved at a split
> boundary of 8.
>
> VGPR32-VGPR39 (Caller-saved)
>
> VGPR40-VGPR47 (Callee-saved)
>
> VGPR48-VGPR55 (Caller-saved)
>
>               -
>
>               -
>
> VGPR248-VGPR255 (Callee-saved)
>
>
>
> With the new ABI, the allocator’s preference for callee-saved vs
> caller-saved depends on the input program.
>
> RA may end up allocating more caller-saved registers than the callee-saved
> in certain cases. The other way of allocation is possible too (more
> callee-saved registers)
>
> In either case, there will be unallocated registers left behind, bumping
> up the final VGPRs into a considerable number. It will have a bad impact on
> the occupancy.
>
> To override the default allocation preferences of RA, we tried to set a
> cost for all VGPRs such that the higher indices will have higher cost.
>
> It eliminated the problem by allocating all lower registers before picking
> the higher one, and with an expense of some spills in certain cases which
> is acceptable.
>
>
>
> But for the kernels with no device-function calls, the register cost is
> unnecessary. Because there is no ABI for such kernel programs.
>
> It caused a performance penalty for such kernels due to the register cost.
>
> That’s the exact reason we need a method to determine dynamically either
> to have a reg-cost or not to have one.
>
>
>
> Regards,
>
> Christudasan
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200530/178f6314/attachment.html>


More information about the llvm-dev mailing list