[llvm-dev] [RFC] Array Register Files
Nicolai Hähnle via llvm-dev
llvm-dev at lists.llvm.org
Sun Oct 7 10:39:03 PDT 2018
Hi all,
There's a rather major piece of work that's been in the back of my mind
for a while now. Before I actually start any work on it, I'd like to
hear people's opinions, if any.
tl;dr: I'd like to augment the CodeGen physical register / register
class infrastructure with a mechanism to represent large regular
register files more efficiently.
The motivation is that the existing infrastructure really isn't a good
fit for the AMDGPU backend. We have ~104 scalar registers and 256 vector
registers. In addition to the sheer number of registers, there are some
qualitative factors that set us apart from most (all?) other backends:
1. The order of register matters: if we use only a subset of registers,
and that subset is on the low end of the range, we can run more work in
parallel on the hardware. (This is modeled with regalloc priorities today.)
2. We can indirectly index into both register files. If a function has
an alloca'd array of 17 values, we may want to lower that as 17
consecutive registers and access the array using indirect access
instructions.
3. Even this aside, the number of register classes we'd really need is
quite staggering. We have machine instructions taking operands with
anywhere from 1 to 12 consecutive registers.
Modeling this as register classes with sub-registers is clearly not a
good match semantically, let alone from a compile time performance point
of view. Today, we take effectively a runtime performance hit in some
cases due to not properly modeling the properties of our register files.
What I'd like to have
---------------------
I'd like to introduce the notion of array register files. Physical
registers in an ARF would be described by
- the register array ID
- the starting index of the "register"
- the ending index of the "register" (inclusive)
This can be conveniently encoded in the 31 bits we effectively have
available for physical registers (e.g. 13 bits for start / end, 5 bits
for register array ID).
Register array ID 0 would be reserved for the familiar, TableGen-defined
registers (we would still have some of those in AMDGPU, even with this
change).
It would necessarily have to be possible to generate register classes
for ARFs both in TableGen and on-the-fly while compiling. Base register
classes would be defined by:
- the register array ID
- the minimum start / maximum end index of registers in the class
- the size of registers in the class
- the alignment of registers in the class (i.e., registers must start at
multiples of N, where N is a power of two)
... and then register classes might be the union of such register
classes and traditional register classes.
(For example, in AMDGPU we would have a register class that includes all
register from the scalar array, with size 2, starting at an even offset,
union'd with a class containing some special registers such as VCC.)
A similar scheme would have to be used for sub-register indices.
I haven't dug too deeply into this yet, and clearly there are quite a
number of thorny issues that need to be addressed -- it's a rather big
project. But so far I'm not aware of anything absolutely fundamental
that would prevent doing this.
What I'm asking you at this point
---------------------------------
Like I said, I haven't actually started any of this work (and probably
won't for some time).
However, if you have any fundamental objections to such a change, please
speak up now before I or anybody else embarks on this project. I want to
be confident that people are okay with the general direction.
Also, if you have any advice or suggestions (maybe an alternative that
would also fit the requirements of the AMDGPU backend), I'd be happy to
hear about it!
Thanks,
Nicolai
--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
More information about the llvm-dev
mailing list