[PATCH] D78091: [AMDGPU] Enable carry out ADD/SUB operations divergence driven instruction selection.

Wed Apr 15 04:20:36 PDT 2020

alex-t marked 5 inline comments as done.
alex-t added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1086
+  if (!IsVALU) {
+    for (auto Use : N->uses()) {
+      if (Use->isMachineOpcode() && TII->isVALU(Use->getMachineOpcode())) {
----------------
arsenm wrote:
> Shouldn't need to scan all uses?
I assume that the one user selected to VALU is enough to make selecting carryout to SALU impractical. Also, how would you suggest to decide about VALU/SALU basing on the quantity of VALU users? Ratio VALU/SALU? Other heuristic?

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUISelDAGToDAG.cpp:1087
+    for (auto Use : N->uses()) {
+      if (Use->isMachineOpcode() && TII->isVALU(Use->getMachineOpcode())) {
+        IsVALU = true;
----------------
rampitec wrote:
> Should it look thru PHI and COPY? It may need a helper function though.
It works on the selection DAG on per block basis so no PHI nodes ever exist here. There are CopyToReg nodes representing cross block values. Those in order only have MVTs like i32, f32 etc...
Although, I could try to address their divergence using reg2value FunctionLoweringInfo map, similar to cross block input regs in isSDNodeSourceOfDivergence function.  I'd consider this as a kind of further improvement.
Also, nobody guaranty the BBs selection order so the cross block scan may appear too complicated.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:3623-3625
+    BuildMI(*BB, MI, DL, TII->get(AMDGPU::S_CSELECT_B64), Dest1.getReg())
+        .addImm(1)
+        .addImm(0);
----------------
arsenm wrote:
> You can just do COPY from SCC
I did it this way because I noticed that the code in SIInstrInfo::CopyPhysReg restricted to 32bit destination.

```
if (RC == &AMDGPU::SReg_32_XM0RegClass ||
      RC == &AMDGPU::SReg_32RegClass) {
    if (SrcReg == AMDGPU::SCC) {
      BuildMI(MBB, MI, DL, get(AMDGPU::S_CSELECT_B32), DestReg)
          .addImm(1)
          .addImm(0);
      return;
    }
```
So, I expected this remark and postpone the decision for review :)
Would you mind me extending this to 64bit?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78091/new/

https://reviews.llvm.org/D78091