[PATCH] D96906: [AMDGPU] gfx90a support

Tue Feb 23 11:53:58 PST 2021

arsenm added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIFormMemoryClauses.cpp:191-199
+  MCRegister RepReg;
+  for (MCRegister R : *MRI->getRegClass(Reg)) {
+    if (!MRI->isReserved(R)) {
+      RepReg = R;
+      break;
+    }
+  }
----------------
rampitec wrote:
> rampitec wrote:
> > arsenm wrote:
> > > rampitec wrote:
> > > > arsenm wrote:
> > > > > rampitec wrote:
> > > > > > arsenm wrote:
> > > > > > > This is a problem because I've removed forAllLanes.
> > > > > > > 
> > > > > > > This is a hack, we should be using a different register class for cases that don't support a given subregister index not scanning for an example non-reserved register
> > > > > > This would be massive duplication of all instructions with such operands, isn't it?
> > > > > Ideally yes. We can still use register classes for this, with special care to make sure we never end up with the unaligned virtual registers in the wrong contexts.
> > > > > 
> > > > >  The less that's tracked by the instruction definitions, the more special case code we have to right. I've been thinking of swapping out the entire MCInstrDesc table per-subtarget to make this easier, although that may be a painful change.
> > > > I do not see how it can be less code. You will need to duplicate all VALU pseudos, not just real instructions. Which means every time you write in the source something like AMDGPU::FLAT_LOAD_DWORDX2 you would have to write an if. For every VALU instruction.
> > > It's less code because the code that's already there is supposed to rely on the static operand definitions. Every time we want to deviate from those, we end up writing manual code in the verifier and fixup things here and there that differ.
> > > 
> > > The point of swapping out the table would be to eliminate all the VALU pseudos. We would just have the same enum values referring to different physical instruction definitions
> > This makes sense, although as you said also quite painful and to me also sounds like a hack. There is still a lot of legalization needed even with this approach. Every time you hit an instruction not supported by a target you will need to do something about it. In a worst case expanding. Sounds like another year of work. Especially when you look at highly specialized ASICs which can do this but cannot do that, and you have a lot them.
> JBTW it will not help anyway. Not for this problem. You may create an operand of a different RC or you may just reserve every other register like I did, the net result will be the same, you will end up using prohibited register. Imagine you are using an RC where only even tuples are added. And then you are using sub1_sub2 subreg of it. RA will happily allocate forbidden register just like it does now. To me this is RA bug in the first place to allocate a reserved register.
> 
> The only thing which could help is an another register info without odd wide subregs, but that you cannot achieve just by duplication of instruction definitions, for that you would need to duplicate register info as well. This is almost a new BE.
It's not a hack, this is how operand classes are intended to work. You wouldn't be producing these instructions on targets that don't support them (ideally we would also have a verifier for this, which is another area where subtarget handling is weak).

The point is to not reserve them. References to unaligned registers can exist, they just can't be used in the context of a real machine operand. D97316 switches to using dedicated classes for alignment (the further cleanup would be to have this come directly from the instruction definitions instead of fixing them up after isel).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96906/new/

https://reviews.llvm.org/D96906