<div dir="auto"><div><div class="gmail_quote"><div dir="ltr">On Sun, Oct 7, 2018, 10:39 Nicolai Hähnle via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

There's a rather major piece of work that's been in the back of my mind <br>

for a while now. Before I actually start any work on it, I'd like to <br>

hear people's opinions, if any.<br>

<br>

tl;dr: I'd like to augment the CodeGen physical register / register <br>

class infrastructure with a mechanism to represent large regular <br>

register files more efficiently.<br>

<br>

The motivation is that the existing infrastructure really isn't a good <br>

fit for the AMDGPU backend. We have ~104 scalar registers and 256 vector <br>

registers. In addition to the sheer number of registers, there are some <br>

qualitative factors that set us apart from most (all?) other backends:<br>

<br>

1. The order of register matters: if we use only a subset of registers, <br>

and that subset is on the low end of the range, we can run more work in <br>

parallel on the hardware. (This is modeled with regalloc priorities today.)<br>

<br>

2. We can indirectly index into both register files. If a function has <br>

an alloca'd array of 17 values, we may want to lower that as 17 <br>

consecutive registers and access the array using indirect access <br>

instructions.<br>

<br>

3. Even this aside, the number of register classes we'd really need is <br>

quite staggering. We have machine instructions taking operands with <br>

anywhere from 1 to 12 consecutive registers.<br>

<br>

Modeling this as register classes with sub-registers is clearly not a <br>

good match semantically, let alone from a compile time performance point <br>

of view. Today, we take effectively a runtime performance hit in some <br>

cases due to not properly modeling the properties of our register files.<br>

<br>

<br>

What I'd like to have<br>

---------------------<br>

I'd like to introduce the notion of array register files. Physical <br>

registers in an ARF would be described by<br>

<br>

- the register array ID<br>

- the starting index of the "register"<br>

- the ending index of the "register" (inclusive)<br>

<br>

This can be conveniently encoded in the 31 bits we effectively have <br>

available for physical registers (e.g. 13 bits for start / end, 5 bits <br>

for register array ID).<br>

<br>

Register array ID 0 would be reserved for the familiar, TableGen-defined <br>

registers (we would still have some of those in AMDGPU, even with this <br>

change).<br>

<br>

It would necessarily have to be possible to generate register classes <br>

for ARFs both in TableGen and on-the-fly while compiling. Base register <br>

classes would be defined by:<br>

<br>

- the register array ID<br>

- the minimum start / maximum end index of registers in the class<br>

- the size of registers in the class<br>

- the alignment of registers in the class (i.e., registers must start at <br>

multiples of N, where N is a power of two)<br>

<br>

... and then register classes might be the union of such register <br>

classes and traditional register classes.<br>

<br>

(For example, in AMDGPU we would have a register class that includes all <br>

register from the scalar array, with size 2, starting at an even offset, <br>

union'd with a class containing some special registers such as VCC.)<br>

<br>

A similar scheme would have to be used for sub-register indices.<br>

<br>

I haven't dug too deeply into this yet, and clearly there are quite a <br>

number of thorny issues that need to be addressed -- it's a rather big <br>

project. But so far I'm not aware of anything absolutely fundamental <br>

that would prevent doing this.<br>

<br>

<br>

What I'm asking you at this point<br>

---------------------------------<br>

Like I said, I haven't actually started any of this work (and probably <br>

won't for some time).<br>

<br>

However, if you have any fundamental objections to such a change, please <br>

speak up now before I or anybody else embarks on this project. I want to <br>

be confident that people are okay with the general direction.<br>

<br>

Also, if you have any advice or suggestions (maybe an alternative that <br>

would also fit the requirements of the AMDGPU backend), I'd be happy to <br>

hear about it!<br></blockquote></div></div><div dir="auto">This sounds like it would also be useful for allocating the consecutive registers needed to implement vectors in SimpleV, a parallelism extension to RISC-V.</div><div dir="auto"><br></div><div dir="auto">Jacob Lifshay</div><div dir="auto"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

</blockquote></div></div></div>