[PATCH] D78091: [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 23 11:53:40 PDT 2020
rampitec added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3698
+ ? MRI.getRegClass(Src1.getReg())
+ : &AMDGPU::VGPR_32RegClass;
+ const TargetRegisterClass *Src0SubRC =
----------------
alex-t wrote:
> rampitec wrote:
> > alex-t wrote:
> > > rampitec wrote:
> > > > VReg_64? Since it did not fail anywhere this case must be not covered by any tests.
> > > I maybe misunderstand the documentation, but it says that the size we only can have 32bit immediate aa operand.
> > > I also did some experiments with different targets (gfx600,900,1010) and always have seen that 64bit size constant was split into 2 32bit parts for addition.
> > > Please correct me if I understand it in a wrong way.
> > Two lines below you are asking for sub0 of that RC. VGPR_32 does not have sub0.
> Yes. And it is exactly what is expected :)
> getSubRegClass returns VGPR_32RegClass itself in this case. In fact id does not matter what it is.
> buildExtractSubRegOrImm does not use SubRC argument if operand is immediate.
>
>
> ```
> if (Op.isImm()) {
> if (SubIdx == AMDGPU::sub0)
> return MachineOperand::CreateImm(static_cast<int32_t>(Op.getImm()));
> if (SubIdx == AMDGPU::sub1)
> return MachineOperand::CreateImm(static_cast<int32_t>(Op.getImm() >> 32));
>
> llvm_unreachable("Unhandled register index for immediate");
> }
> ```
> Once again, I maybe don't understand what your objection is about.
>
> For the simple i64 immediate addition like this:
>
> ```
> %add = add i64 20015998343286, %a
> ```
> we generate carry out:
>
> ```
> s_add_u32 s0, s2, 0x56789876
> s_addc_u32 s1, s3, 0x1234
> ```
> for uniform
>
> or
> ```
> v_add_co_u32_e32 v0, vcc, 0x56789876, v0
> v_mov_b32_e32 v1, 0x1234
> v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
>
> ```
> for divergent.
>
> So, why do we need VReg_64?
>
>
You are calling TRI->getSubRegClass(Src1RC, AMDGPU::sub1); on this RC. You want to have VGPR_32 as an answer. Even though getSubRegClass() may return RC itself if it does not have a requested subreg this sounds like a bug. It would be more natural for it to assert. To be on a safe side pass there VReg_64 to get the same VGPR_32 or just do not call it if it is an immediate.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3616
+ BuildMI(*BB, MI, DL, TII->get(Opc), Dest0.getReg())
+ .addReg(Src0.getReg())
+ .add(Src1);
----------------
It can be immediate.
================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3688
+ Register CarryReg = MRI.createVirtualRegister(CarryRC);
+ Register DeadCarryReg = MRI.createVirtualRegister(CarryRC);
+
----------------
You can probably use SIInstrInfo::getAddNoCarry() and extend it to produce sub as well or create e new helper. You are always using I32 version even if a no-carry U32 version is available.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D78091/new/
https://reviews.llvm.org/D78091
More information about the llvm-commits
mailing list