[PATCH] D75472: [AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block

Alexander via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 2 10:41:18 PST 2020


alex-t created this revision.
alex-t added reviewers: rampitec, vpykhtin, nhaehnle.
Herald added subscribers: kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl, arsenm.
Herald added a project: LLVM.

When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop.

To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result.

%95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32

results to

  s_mov_b64 s[6:7], exec

BB0_16:

  v_readfirstlane_b32 s8, v28
  v_cmp_eq_u32_e32 vcc, s8, v28
  s_and_saveexec_b64 vcc, vcc
  s_set_gpr_idx_on s8, gpr_idx(DST)
  v_mov_b32_e32 v6, v25
  s_set_gpr_idx_off
  s_xor_b64 exec, exec, vcc
  s_cbranch_execnz BB0_16

; %bb.17:

  s_mov_b64 exec, s[6:7]
  
   

The bug appeared in case this expansion occurs in the ELSE block of the CF.

Originally

  %110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32,
   %112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec

expanded to

- <== here exec has "THEN" context

  s_mov_b64 s[6:7], exec

BB0_16:

  v_readfirstlane_b32 s8, v28
  v_cmp_eq_u32_e32 vcc, s8, v28
  s_and_saveexec_b64 vcc, vcc
  s_set_gpr_idx_on s8, gpr_idx(DST)
  v_mov_b32_e32 v6, v25
  s_set_gpr_idx_off
  s_xor_b64 exec, exec, vcc
  s_cbranch_execnz BB0_16

; %bb.17:

  s_or_saveexec_b64 s[4:5], s[4:5]   <-- exec mask is restored for "ELSE" but immediately overwritten.
  s_mov_b64 exec, s[6:7]

The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask"

SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec.

Proposed fix:  The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration.


https://reviews.llvm.org/D75472

Files:
  llvm/lib/Target/AMDGPU/SIISelLowering.cpp


Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3308,9 +3308,15 @@
   auto InsPt = emitLoadM0FromVGPRLoop(TII, MRI, MBB, *LoopBB, DL, *Idx,
                                       InitResultReg, DstReg, PhiReg, TmpExec,
                                       Offset, UseGPRIdxMode, IsIndirectSrc);
-
-  MachineBasicBlock::iterator First = RemainderBB->begin();
-  BuildMI(*RemainderBB, First, DL, TII->get(MovExecOpc), Exec)
+  MachineBasicBlock* LandingPad = MF->CreateMachineBasicBlock();
+  MachineFunction::iterator MBBI(LoopBB);
+  ++MBBI;
+  MF->insert(MBBI, LandingPad);
+  LoopBB->removeSuccessor(RemainderBB);
+  LandingPad->addSuccessor(RemainderBB);
+  LoopBB->addSuccessor(LandingPad);
+  MachineBasicBlock::iterator First = LandingPad->begin();
+  BuildMI(*LandingPad, First, DL, TII->get(MovExecOpc), Exec)
     .addReg(SaveExec);
 
   return InsPt;


-------------- next part --------------
A non-text attachment was scrubbed...
Name: D75472.247686.patch
Type: text/x-patch
Size: 1028 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200302/c874b300/attachment.bin>


More information about the llvm-commits mailing list