[PATCH] D109300: [AMDGPU] Make vector superclasses allocatable

Tue Sep 21 12:30:58 PDT 2021

rampitec added a comment.

In D109300#3012033 <https://reviews.llvm.org/D109300#3012033>, @cdevadas wrote:

> In D109300#3010692 <https://reviews.llvm.org/D109300#3010692>, @rampitec wrote:
>
>> In general selector cannot do it. You need to know RP by class and chose an RC accordingly. Moreover, on gfx90a you would also need to chose an instruction accordingly.
>
> Isn't it possible to select legal register classes by having constrained register operands in the instruction definition itself?

You could do specific instructions. Moreover, gfx90a has introduced _acd andf _vcd mfma variants because register classes of SrcC and Dst are tied. For SrcA and SrcB we actually have a freedom to use either RC independently. So we need to make a decision which register class to use and then which instruction to use consequentially. That decision cannot be reasonably done at selection time, in fact it can only be done during regalloc when we know actual pressure. Moreover, the decision will be different for gfx908 and gfx90a. For gfx908 you would want to allocate equal amount of both types of registers, and for gfx90a you would want to stay with VGPRs as long as you can. Then changing an RC will also require replacement of other instructions dealing with it, like v_accvgpr_* will turn into v_mov_b32 and vice versa, some v_accvgpr_* instructions dropped etc. And all of that is only possible inside the RA.

Right now we do not do anything like this, SrcC and Dst are always AGPRs, then SrcA and SrcB are VGPRs I believe. It is a suboptimal but easy choice. But since you are enabling AV operands you are leaving this to RA to decide, and I have a question how exactly will it decide? We could constraint reg classes of mfma operands just like we do now. That is probably an easiest way to at least preserve the functionality.

One thing to keep in mind: and allocatable AV class directly affects instructions we have to generate, not just select but also create in many different passes. I.e. if the specific class is AGPR we have to use v_accvgpr_*, if that is VGPR we have to use v_mob_b32. Then v_accvgpr_write_b32 does not work with immediates on gfx908, it does not work with SGPR sources, loads and stores do not work with AGPRs in gfx908 etc. I.e. that would be a huge and unsound codegen change across the BE.

With all of that I assume we have to always use specific RC for everything we select and generate. We basically cannot use AV as an operand of any instruction we have produced. The only use for the allocatable AV shall be RA copying registers instead of spilling. I do not think that is what this change is doing now.

================
Comment at: llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp:262
   MI.getOperand(0).setReg(CopyUse.getOperand(0).getReg());
-  bool IsAGPR = TRI->hasAGPRs(DstRC);
+  bool IsAGPR = TRI->isAGPRClass(DstRC);

----------------
cdevadas wrote:
> rampitec wrote:
> > Now that AV is allocatable I suspect we will miss a lot of optimizations with multiple changes like this (e.g. hasAGPRs -> isAGPRClass) and maybe even make wrong and illegal decisions because we will be unable to correctly identify if that is a VGPR or AGPR presented an AV.
> I agree we might miss out on certain optimizations. That should be handled (separately).
> COPY is a special instruction, the machine opcode changes based on the target register for AMDGPU. 
> Any Pre-RA copy instance with AV class, I am forcing it to the VGPR class.
> 
> But illegal decisions should be averted. If it happens, mostly occurs at the selection.
> 
The point of this code is to avoid unneeded copies between VGPRs and AGPRs. Forcing it to VGPRs misses the point.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:896
   unsigned Opcode = AMDGPU::V_MOV_B32_e32;
-  if (RI.hasAGPRs(RC)) {
+  if (RI.isAGPRClass(RC)) {
     Opcode = (RI.hasVGPRs(SrcRC)) ?
----------------
cdevadas wrote:
> rampitec wrote:
> > What will happen if that is AV?
> I guess this function, `copyPhysReg` will only be called post-RA.
Ah, right.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2432
+      Idx == AMDGPU::RegisterPressureSets::AGPR_32 ||
+      Idx == AMDGPU::RegisterPressureSets::AV_32)
     return getRegPressureLimit(&AMDGPU::VGPR_32RegClass,
----------------
cdevadas wrote:
> rampitec wrote:
> > I do not think there can be a separate pressure on AV. What does that mean for AV pressure if you have pressure 256 on V and 256 on A? It cannot be increased or decreased separately.
> The assert here https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/CodeGenRegisters.cpp#L2028
> enforces at least one PSet when we make the AV classes allocatable.
> 
> There should have been a flag to entirely avoid the psets for allocatable regclasses.
There is GeneratePressureSet to avoid it, but the issue is conceptual. I do not understand a strategy of tracking pressure for a combined class.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109300/new/

https://reviews.llvm.org/D109300