[llvm-dev] [RFC] Array Register Files

Jacob Lifshay via llvm-dev llvm-dev at lists.llvm.org
Sun Oct 7 10:45:06 PDT 2018


On Sun, Oct 7, 2018, 10:39 Nicolai Hähnle via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hi all,
>
> There's a rather major piece of work that's been in the back of my mind
> for a while now. Before I actually start any work on it, I'd like to
> hear people's opinions, if any.
>
> tl;dr: I'd like to augment the CodeGen physical register / register
> class infrastructure with a mechanism to represent large regular
> register files more efficiently.
>
> The motivation is that the existing infrastructure really isn't a good
> fit for the AMDGPU backend. We have ~104 scalar registers and 256 vector
> registers. In addition to the sheer number of registers, there are some
> qualitative factors that set us apart from most (all?) other backends:
>
> 1. The order of register matters: if we use only a subset of registers,
> and that subset is on the low end of the range, we can run more work in
> parallel on the hardware. (This is modeled with regalloc priorities today.)
>
> 2. We can indirectly index into both register files. If a function has
> an alloca'd array of 17 values, we may want to lower that as 17
> consecutive registers and access the array using indirect access
> instructions.
>
> 3. Even this aside, the number of register classes we'd really need is
> quite staggering. We have machine instructions taking operands with
> anywhere from 1 to 12 consecutive registers.
>
> Modeling this as register classes with sub-registers is clearly not a
> good match semantically, let alone from a compile time performance point
> of view. Today, we take effectively a runtime performance hit in some
> cases due to not properly modeling the properties of our register files.
>
>
> What I'd like to have
> ---------------------
> I'd like to introduce the notion of array register files. Physical
> registers in an ARF would be described by
>
> - the register array ID
> - the starting index of the "register"
> - the ending index of the "register" (inclusive)
>
> This can be conveniently encoded in the 31 bits we effectively have
> available for physical registers (e.g. 13 bits for start / end, 5 bits
> for register array ID).
>
> Register array ID 0 would be reserved for the familiar, TableGen-defined
> registers (we would still have some of those in AMDGPU, even with this
> change).
>
> It would necessarily have to be possible to generate register classes
> for ARFs both in TableGen and on-the-fly while compiling. Base register
> classes would be defined by:
>
> - the register array ID
> - the minimum start / maximum end index of registers in the class
> - the size of registers in the class
> - the alignment of registers in the class (i.e., registers must start at
> multiples of N, where N is a power of two)
>
> ... and then register classes might be the union of such register
> classes and traditional register classes.
>
> (For example, in AMDGPU we would have a register class that includes all
> register from the scalar array, with size 2, starting at an even offset,
> union'd with a class containing some special registers such as VCC.)
>
> A similar scheme would have to be used for sub-register indices.
>
> I haven't dug too deeply into this yet, and clearly there are quite a
> number of thorny issues that need to be addressed -- it's a rather big
> project. But so far I'm not aware of anything absolutely fundamental
> that would prevent doing this.
>
>
> What I'm asking you at this point
> ---------------------------------
> Like I said, I haven't actually started any of this work (and probably
> won't for some time).
>
> However, if you have any fundamental objections to such a change, please
> speak up now before I or anybody else embarks on this project. I want to
> be confident that people are okay with the general direction.
>
> Also, if you have any advice or suggestions (maybe an alternative that
> would also fit the requirements of the AMDGPU backend), I'd be happy to
> hear about it!
>
This sounds like it would also be useful for allocating the consecutive
registers needed to implement vectors in SimpleV, a parallelism extension
to RISC-V.

Jacob Lifshay

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181007/7bd9ee3e/attachment.html>


More information about the llvm-dev mailing list