[PATCH] D109300: [AMDGPU] Make vector superclasses allocatable

Tue Sep 21 05:04:16 PDT 2021

cdevadas added a comment.

In D109300#3010692 <https://reviews.llvm.org/D109300#3010692>, @rampitec wrote:

> In D109300#3010650 <https://reviews.llvm.org/D109300#3010650>, @arsenm wrote:
>
>> In D109300#3010612 <https://reviews.llvm.org/D109300#3010612>, @rampitec wrote:
>>
>>> Yet another question concerns gfx90a. Assume we are reading matrix C from memory into a register tuple. The mfma would need to use AGPR, but the load may use either AGPR or VGPR and has AV operand (same for a store). How likely will it happen that a VGPR tuple will be used for the load and then copied into AGPR? Can this happen?
>>
>> Ideally we would have the concrete classes chosen ahead of time (i.e. regbankselect analyses the uses and defs and sets the bank, resulting in the concrete class). The main reason to make this change is to allow the allocator to split ranges with cross class copies. We wouldn't generally want AV_* classes in the incoming MIR.
>
> In general selector cannot do it. You need to know RP by class and chose an RC accordingly. Moreover, on gfx90a you would also need to chose an instruction accordingly.

Isn't it possible to select legal register classes by having constrained register operands in the instruction definition itself?

================
Comment at: llvm/lib/Target/AMDGPU/SIFixSGPRCopies.cpp:262
   MI.getOperand(0).setReg(CopyUse.getOperand(0).getReg());
-  bool IsAGPR = TRI->hasAGPRs(DstRC);
+  bool IsAGPR = TRI->isAGPRClass(DstRC);

----------------
rampitec wrote:
> Now that AV is allocatable I suspect we will miss a lot of optimizations with multiple changes like this (e.g. hasAGPRs -> isAGPRClass) and maybe even make wrong and illegal decisions because we will be unable to correctly identify if that is a VGPR or AGPR presented an AV.
I agree we might miss out on certain optimizations. That should be handled (separately).
COPY is a special instruction, the machine opcode changes based on the target register for AMDGPU. 
Any Pre-RA copy instance with AV class, I am forcing it to the VGPR class.

But illegal decisions should be averted. If it happens, mostly occurs at the selection.

================
Comment at: llvm/lib/Target/AMDGPU/SIInstrInfo.cpp:896
   unsigned Opcode = AMDGPU::V_MOV_B32_e32;
-  if (RI.hasAGPRs(RC)) {
+  if (RI.isAGPRClass(RC)) {
     Opcode = (RI.hasVGPRs(SrcRC)) ?
----------------
rampitec wrote:
> What will happen if that is AV?
I guess this function, `copyPhysReg` will only be called post-RA.

================
Comment at: llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp:2432
+      Idx == AMDGPU::RegisterPressureSets::AGPR_32 ||
+      Idx == AMDGPU::RegisterPressureSets::AV_32)
     return getRegPressureLimit(&AMDGPU::VGPR_32RegClass,
----------------
rampitec wrote:
> I do not think there can be a separate pressure on AV. What does that mean for AV pressure if you have pressure 256 on V and 256 on A? It cannot be increased or decreased separately.
The assert here https://github.com/llvm/llvm-project/blob/main/llvm/utils/TableGen/CodeGenRegisters.cpp#L2028
enforces at least one PSet when we make the AV classes allocatable.

There should have been a flag to entirely avoid the psets for allocatable regclasses.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109300/new/

https://reviews.llvm.org/D109300