[PATCH] D78091: [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.

Thu Apr 16 10:02:06 PDT 2020

alex-t marked 2 inline comments as done.
alex-t added a comment.

In D78091#1981388 <https://reviews.llvm.org/D78091#1981388>, @rampitec wrote:

> You need to add tests for selection and moveToVALU, including immediates and wave32.

Carry outs - UADDO/USUBO are already covered by the existing uaddo.ll and usubo.ll. There exist examples with both divergent and uniform ISD::UADDO/ISD::USUBO nodes to select.
That test are already updated.
SIFixSGPRCopies::moveToVALU part is covered as well by the udiv64.ll, urem64.ll, sdiv64.ll, srem64.ll. All that tests contains identical tests one of which declared as kernel and therefore has uniform arguments and another one as function and has divergent arguments.
The former one contains uniform UADDO and ADDCARRY that are initially selected to S_ADD_CO_PSEUDO but later on converted in moveToVALU to V_ADD/V_ADDC.

So, I am only planning to add the pure selection MIR tests for ISD opcodes.
Wave64/32 tests are probably needed as wellas the immediate operands tests.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp:918-923
+  // TODO: We have to add FinalizeISel
+  // to expand V_ADD/SUB_U64_PSEUDO before SIFixupVectorISel
+  // that expects V_ADD/SUB -> A_ADDC/SUBB pairs expanded.
+  // Will be removed as soon as SIFixupVectorISel is changed
+  // to work with V_ADD/SUB_U64_PSEUDO instead.
+  addPass(&FinalizeISelID);
----------------
arsenm wrote:
> This is just broken. We already run it, and there shouldn't' be a reason to involve SIFixupVectorISel
As I have seen debugging, FinalizaISel gets invoked from the TargetPassConfig base class just after a bundle of passes defined as InstrSelector by the Target. So in our case - after createSIAddIMGInitPass. That's why I had to add this.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3698
+                                            ? MRI.getRegClass(Src1.getReg())
+                                            : &AMDGPU::VGPR_32RegClass;
+    const TargetRegisterClass *Src0SubRC =
----------------
rampitec wrote:
> VReg_64? Since it did not fail anywhere this case must be not covered by any tests.
I maybe misunderstand the documentation, but it says that the size we only can have 32bit immediate aa operand.
I also did some experiments with different targets (gfx600,900,1010) and always have seen that 64bit size constant was split into 2 32bit parts for addition.
Please correct me if I understand it in a wrong way.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78091/new/

https://reviews.llvm.org/D78091