[llvm-dev] [RFC] Array Register Files

Sun Oct 7 10:39:03 PDT 2018

Hi all,

There's a rather major piece of work that's been in the back of my mind 
for a while now. Before I actually start any work on it, I'd like to 
hear people's opinions, if any.

tl;dr: I'd like to augment the CodeGen physical register / register 
class infrastructure with a mechanism to represent large regular 
register files more efficiently.

The motivation is that the existing infrastructure really isn't a good 
fit for the AMDGPU backend. We have ~104 scalar registers and 256 vector 
registers. In addition to the sheer number of registers, there are some 
qualitative factors that set us apart from most (all?) other backends:

1. The order of register matters: if we use only a subset of registers, 
and that subset is on the low end of the range, we can run more work in 
parallel on the hardware. (This is modeled with regalloc priorities today.)

2. We can indirectly index into both register files. If a function has 
an alloca'd array of 17 values, we may want to lower that as 17 
consecutive registers and access the array using indirect access 
instructions.

3. Even this aside, the number of register classes we'd really need is 
quite staggering. We have machine instructions taking operands with 
anywhere from 1 to 12 consecutive registers.

Modeling this as register classes with sub-registers is clearly not a 
good match semantically, let alone from a compile time performance point 
of view. Today, we take effectively a runtime performance hit in some 
cases due to not properly modeling the properties of our register files.

What I'd like to have
---------------------
I'd like to introduce the notion of array register files. Physical 
registers in an ARF would be described by

- the register array ID
- the starting index of the "register"
- the ending index of the "register" (inclusive)

This can be conveniently encoded in the 31 bits we effectively have 
available for physical registers (e.g. 13 bits for start / end, 5 bits 
for register array ID).

Register array ID 0 would be reserved for the familiar, TableGen-defined 
registers (we would still have some of those in AMDGPU, even with this 
change).

It would necessarily have to be possible to generate register classes 
for ARFs both in TableGen and on-the-fly while compiling. Base register 
classes would be defined by:

- the register array ID
- the minimum start / maximum end index of registers in the class
- the size of registers in the class
- the alignment of registers in the class (i.e., registers must start at 
multiples of N, where N is a power of two)

... and then register classes might be the union of such register 
classes and traditional register classes.

(For example, in AMDGPU we would have a register class that includes all 
register from the scalar array, with size 2, starting at an even offset, 
union'd with a class containing some special registers such as VCC.)

A similar scheme would have to be used for sub-register indices.

I haven't dug too deeply into this yet, and clearly there are quite a 
number of thorny issues that need to be addressed -- it's a rather big 
project. But so far I'm not aware of anything absolutely fundamental 
that would prevent doing this.

What I'm asking you at this point
---------------------------------
Like I said, I haven't actually started any of this work (and probably 
won't for some time).

However, if you have any fundamental objections to such a change, please 
speak up now before I or anybody else embarks on this project. I want to 
be confident that people are okay with the general direction.

Also, if you have any advice or suggestions (maybe an alternative that 
would also fit the requirements of the AMDGPU backend), I'd be happy to 
hear about it!

Thanks,
Nicolai
-- 
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.