[PATCH] D78091: [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.

Thu Apr 23 11:53:40 PDT 2020

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3698
+                                            ? MRI.getRegClass(Src1.getReg())
+                                            : &AMDGPU::VGPR_32RegClass;
+    const TargetRegisterClass *Src0SubRC =
----------------
alex-t wrote:
> rampitec wrote:
> > alex-t wrote:
> > > rampitec wrote:
> > > > VReg_64? Since it did not fail anywhere this case must be not covered by any tests.
> > > I maybe misunderstand the documentation, but it says that the size we only can have 32bit immediate aa operand.
> > > I also did some experiments with different targets (gfx600,900,1010) and always have seen that 64bit size constant was split into 2 32bit parts for addition.
> > > Please correct me if I understand it in a wrong way.
> > Two lines below you are asking for sub0 of that RC. VGPR_32 does not have sub0.
> Yes. And it is exactly what is expected :) 
> getSubRegClass   returns VGPR_32RegClass itself in this case. In fact id does not matter what it is.
> buildExtractSubRegOrImm does not use SubRC argument if operand is immediate.
> 
> 
> ```
>  if (Op.isImm()) {
>     if (SubIdx == AMDGPU::sub0)
>       return MachineOperand::CreateImm(static_cast<int32_t>(Op.getImm()));
>     if (SubIdx == AMDGPU::sub1)
>       return MachineOperand::CreateImm(static_cast<int32_t>(Op.getImm() >> 32));
> 
>     llvm_unreachable("Unhandled register index for immediate");
>   }
> ```
> Once again, I maybe don't understand what your objection is about.
> 
> For the simple i64 immediate addition like this:
> 
> ```
>   %add = add i64 20015998343286, %a
> ```
> we generate carry out:
> 
> ```
> 	s_add_u32 s0, s2, 0x56789876
> 	s_addc_u32 s1, s3, 0x1234
> ```
> for uniform
> 
> or 
> ```
> 	v_add_co_u32_e32 v0, vcc, 0x56789876, v0
> 	v_mov_b32_e32 v1, 0x1234
> 	v_addc_co_u32_e32 v1, vcc, 0, v1, vcc
> 
> ```
> for divergent.
> 
> So, why do we need VReg_64?
> 
> 
You are calling TRI->getSubRegClass(Src1RC, AMDGPU::sub1); on this RC. You want to have VGPR_32 as an answer. Even though getSubRegClass() may return RC itself if it does not have a requested subreg this sounds like a bug. It would be more natural for it to assert. To be on a safe side pass there VReg_64 to get the same VGPR_32 or just do not call it if it is an immediate.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3616
+    BuildMI(*BB, MI, DL, TII->get(Opc), Dest0.getReg())
+        .addReg(Src0.getReg())
+        .add(Src1);
----------------
It can be immediate.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3688
+    Register CarryReg = MRI.createVirtualRegister(CarryRC);
+    Register DeadCarryReg = MRI.createVirtualRegister(CarryRC);
+
----------------
You can probably use SIInstrInfo::getAddNoCarry() and extend it to produce sub as well or create e new helper. You are always using I32 version even if a no-carry U32 version is available.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78091/new/

https://reviews.llvm.org/D78091