[PATCH] D35967: [AMDGPU] Collapse adjacent SI_END_CF

Thu Jul 27 21:10:05 PDT 2017

rampitec added inline comments.

================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:82
+    break;
+  case AMDGPU::S_MOV_B64:
+  case AMDGPU::COPY:
----------------
arsenm wrote:
> I would hope we aren't seeing s_mov_b64s with register inputs at this point. Does this actually happen?
Yes. That is what we actually have at this point:

```
64B             %vreg1<def> = COPY %EXEC, %EXEC<imp-def>; SReg_64:%vreg1
80B             %vreg55<def> = S_AND_B64 %vreg1, %vreg12, %SCC<imp-def,dead>; SReg_64:%vreg55,%vreg1,%vreg12
96B             %EXEC<def> = S_MOV_B64_term %vreg55; SReg_64:%vreg55
```

================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:129
+    for ( ; I != E; ++I) {
+      if (!TII->isSALU(*I) || I->readsRegister(AMDGPU::EXEC, TRI) ||
+          I->isBranch())
----------------
arsenm wrote:
> isBranch check first. I'm not sure why this needs to specifically skip branches though
It actually breaks if outer end_cf is not an immediate layout successor, as a branch does not read exec, at least not an unconditional branch. As far as I understood in that situation with not a simplest enclosure of cfg scopes a block placement can result in a wrong mask at the end.

https://reviews.llvm.org/D35967