[PATCH] D78091: [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.
Alexander via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Apr 24 05:23:25 PDT 2020
alex-t added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3698
+ ? MRI.getRegClass(Src1.getReg())
+ : &AMDGPU::VGPR_32RegClass;
+ const TargetRegisterClass *Src0SubRC =
----------------
rampitec wrote:
> alex-t wrote:
> > rampitec wrote:
> > > alex-t wrote:
> > > > rampitec wrote:
> > > > > VReg_64? Since it did not fail anywhere this case must be not covered by any tests.
> > > > I maybe misunderstand the documentation, but it says that the size we only can have 32bit immediate aa operand.
> > > > I also did some experiments with different targets (gfx600,900,1010) and always have seen that 64bit size constant was split into 2 32bit parts for addition.
> > > > Please correct me if I understand it in a wrong way.
> > > Two lines below you are asking for sub0 of that RC. VGPR_32 does not have sub0.
> > Yes. And it is exactly what is expected :)
> > getSubRegClass returns VGPR_32RegClass itself in this case. In fact id does not matter what it is.
> > buildExtractSubRegOrImm does not use SubRC argument if operand is immediate.
> >
> >
> > ```
> > if (Op.isImm()) {
> > if (SubIdx == AMDGPU::sub0)
> > return MachineOperand::CreateImm(static_cast<int32_t>(Op.getImm()));
> > if (SubIdx == AMDGPU::sub1)
> > return MachineOperand::CreateImm(static_cast<int32_t>(Op.getImm() >> 32));
> >
> > llvm_unreachable("Unhandled register index for immediate");
> > }
> > ```
> > Once again, I maybe don't understand what your objection is about.
> >
> > For the simple i64 immediate addition like this:
> >
> > ```
> > %add = add i64 20015998343286, %a
> > ```
> > we generate carry out:
> >
> > ```
> > s_add_u32 s0, s2, 0x56789876
> > s_addc_u32 s1, s3, 0x1234
> > ```
> > for uniform
> >
> > or
> > ```
> > v_add_co_u32_e32 v0, vcc, 0x56789876, v0
> > v_mov_b32_e32 v1, 0x1234
> > v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
> >
> > ```
> > for divergent.
> >
> > So, why do we need VReg_64?
> >
> >
> You are calling TRI->getSubRegClass(Src1RC, AMDGPU::sub1); on this RC. You want to have VGPR_32 as an answer. Even though getSubRegClass() may return RC itself if it does not have a requested subreg this sounds like a bug. It would be more natural for it to assert. To be on a safe side pass there VReg_64 to get the same VGPR_32 or just do not call it if it is an immediate.
Fine, we have one weird piece - SIRegisterInfo::getSubRegClass that returns input argument back if does not succeed.
Then we have one more weird piece - SIInstrInfo::buildExtractSubRegOrImm that is in fact 2 separate functions. It does completely different things for register and immediate, and just ignores register class arguments for immediate. Unfortunately all arguments required. So, I have to pass to it register class and sub-register class despite of the fact they're not used.
Refactoring these 2 pieces should be separate change.
If you insist that using VGPR_32RegClass is misleading I have no choice but VReg_64RegClass, that is misleading either IMHO.
I'll have to add a FIXME comment to explain why we use VGPR_32RegClass for immediate that can be 32bit only.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3688
+ Register CarryReg = MRI.createVirtualRegister(CarryRC);
+ Register DeadCarryReg = MRI.createVirtualRegister(CarryRC);
+
----------------
rampitec wrote:
> You can probably use SIInstrInfo::getAddNoCarry() and extend it to produce sub as well or create e new helper. You are always using I32 version even if a no-carry U32 version is available.
Not sure which line this comment belong...
For LoOpc I indeed need carry out opcode.
getAddNoCarry returnc the addition that does not write the carry flag.
Anyway, the exact opcode is selected later on by the SIInstrInfo::pseudoToMCOpcode
```
renamable $vgpr0 = V_ADD_I32_e32 1450743926, killed $vgpr0, implicit-def $vcc, implicit $exec
renamable $vgpr1 = V_MOV_B32_e32 4660, implicit $exec
renamable $vgpr1 = V_ADDC_U32_e32 0, killed $vgpr1, implicit-def $vcc, implicit killed $vcc, implicit $exec
```
turns to the
```
v_add_co_u32_e32 v0, vcc, 0x56789876, v0
v_mov_b32_e32 v1, 0x1234
v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
```
for gfx9
but to the
```
v_add_i32_e32 v0, vcc, 0x56789876, v0
v_mov_b32_e32 v1, 0x1234
v_addc_u32_e32 v1, vcc, 0, v1, vcc
```
for gfx6
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D78091/new/
https://reviews.llvm.org/D78091
More information about the llvm-commits
mailing list