[PATCH] D35967: [AMDGPU] Collapse adjacent SI_END_CF

Stanislav Mekhanoshin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jul 28 11:40:38 PDT 2017


rampitec added inline comments.


================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:82
+    break;
+  case AMDGPU::S_MOV_B64:
+  case AMDGPU::COPY:
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > I would hope we aren't seeing s_mov_b64s with register inputs at this point. Does this actually happen?
> > Yes. That is what we actually have at this point:
> > 
> > 
> > ```
> > 64B             %vreg1<def> = COPY %EXEC, %EXEC<imp-def>; SReg_64:%vreg1
> > 80B             %vreg55<def> = S_AND_B64 %vreg1, %vreg12, %SCC<imp-def,dead>; SReg_64:%vreg55,%vreg1,%vreg12
> > 96B             %EXEC<def> = S_MOV_B64_term %vreg55; SReg_64:%vreg55
> > ```
> This won't catch that though. S_MOV_B64_term is different
OK, I really meant to catch COPY, catching term was not an intention. Do you want to have no switch here?


================
Comment at: lib/Target/AMDGPU/SIOptimizeExecMaskingPreRA.cpp:129
+    for ( ; I != E; ++I) {
+      if (!TII->isSALU(*I) || I->readsRegister(AMDGPU::EXEC, TRI) ||
+          I->isBranch())
----------------
arsenm wrote:
> rampitec wrote:
> > arsenm wrote:
> > > isBranch check first. I'm not sure why this needs to specifically skip branches though
> > It actually breaks if outer end_cf is not an immediate layout successor, as a branch does not read exec, at least not an unconditional branch. As far as I understood in that situation with not a simplest enclosure of cfg scopes a block placement can result in a wrong mask at the end.
> Should this be using isLayoutSuccessor then?
I'm scanning instructions anyway, I believe either way it should work.


https://reviews.llvm.org/D35967





More information about the llvm-commits mailing list