[PATCH] D35967: [AMDGPU] Collapse adjacent SI_END_CF
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 28 11:40:38 PDT 2017
rampitec added inline comments.
================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:82
+ break;
+ case AMDGPU::S_MOV_B64:
+ case AMDGPU::COPY:
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > I would hope we aren't seeing s_mov_b64s with register inputs at this point. Does this actually happen?
> > Yes. That is what we actually have at this point:
> >
> >
> > ```
> > 64B %vreg1<def> = COPY %EXEC, %EXEC<imp-def>; SReg_64:%vreg1
> > 80B %vreg55<def> = S_AND_B64 %vreg1, %vreg12, %SCC<imp-def,dead>; SReg_64:%vreg55,%vreg1,%vreg12
> > 96B %EXEC<def> = S_MOV_B64_term %vreg55; SReg_64:%vreg55
> > ```
> This won't catch that though. S_MOV_B64_term is different
OK, I really meant to catch COPY, catching term was not an intention. Do you want to have no switch here?
================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:129
+ for ( ; I != E; ++I) {
+ if (!TII->isSALU(*I) || I->readsRegister(AMDGPU::EXEC, TRI) ||
+ I->isBranch())
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > isBranch check first. I'm not sure why this needs to specifically skip branches though
> > It actually breaks if outer end_cf is not an immediate layout successor, as a branch does not read exec, at least not an unconditional branch. As far as I understood in that situation with not a simplest enclosure of cfg scopes a block placement can result in a wrong mask at the end.
> Should this be using isLayoutSuccessor then?
I'm scanning instructions anyway, I believe either way it should work.
https://reviews.llvm.org/D35967
More information about the llvm-commits
mailing list