[PATCH] D35967: [AMDGPU] Collapse adjacent SI_END_CF
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jul 27 21:10:05 PDT 2017
rampitec added inline comments.
================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:82
+ break;
+ case AMDGPU::S_MOV_B64:
+ case AMDGPU::COPY:
----------------
arsenm wrote:
> I would hope we aren't seeing s_mov_b64s with register inputs at this point. Does this actually happen?
Yes. That is what we actually have at this point:
```
64B %vreg1<def> = COPY %EXEC, %EXEC<imp-def>; SReg_64:%vreg1
80B %vreg55<def> = S_AND_B64 %vreg1, %vreg12, %SCC<imp-def,dead>; SReg_64:%vreg55,%vreg1,%vreg12
96B %EXEC<def> = S_MOV_B64_term %vreg55; SReg_64:%vreg55
```
================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:129
+ for ( ; I != E; ++I) {
+ if (!TII->isSALU(*I) || I->readsRegister(AMDGPU::EXEC, TRI) ||
+ I->isBranch())
----------------
arsenm wrote:
> isBranch check first. I'm not sure why this needs to specifically skip branches though
It actually breaks if outer end_cf is not an immediate layout successor, as a branch does not read exec, at least not an unconditional branch. As far as I understood in that situation with not a simplest enclosure of cfg scopes a block placement can result in a wrong mask at the end.
https://reviews.llvm.org/D35967
More information about the llvm-commits
mailing list