[PATCH] D75472: [AMDGPU] SI_INDIRECT_DST_V* pseudos expansion should place EXEC restore to separate basic block
Alexander via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 2 10:41:18 PST 2020
alex-t created this revision.
alex-t added reviewers: rampitec, vpykhtin, nhaehnle.
Herald added subscribers: kerbowa, hiraditya, t-tye, tpr, dstuttard, yaxunl, wdng, jvesely, kzhuravl, arsenm.
Herald added a project: LLVM.
When SI_INDIRECT_DST_V* pseudos has indexes in VGPR, they get expanded into the self-looped basic block that modifies EXEC in a loop.
To keep EXEC consistent it is stored before and then re-stored after the pseudo expansion result.
%95:vreg_512 = SI_INDIRECT_DST_V16 %93:vreg_512(tied-def 0), %94:sreg_32, 0, killed %1500:vgpr_32
results to
s_mov_b64 s[6:7], exec
BB0_16:
v_readfirstlane_b32 s8, v28
v_cmp_eq_u32_e32 vcc, s8, v28
s_and_saveexec_b64 vcc, vcc
s_set_gpr_idx_on s8, gpr_idx(DST)
v_mov_b32_e32 v6, v25
s_set_gpr_idx_off
s_xor_b64 exec, exec, vcc
s_cbranch_execnz BB0_16
; %bb.17:
s_mov_b64 exec, s[6:7]
The bug appeared in case this expansion occurs in the ELSE block of the CF.
Originally
%110:vreg_512 = SI_INDIRECT_DST_V16 %103:vreg_512(tied-def 0), %85:vgpr_32, 0, %107:vgpr_32,
%112:sreg_64 = SI_ELSE %108:sreg_64, %bb.19, 0, implicit-def dead $exec, implicit-def dead $scc, implicit $exec
expanded to
- <== here exec has "THEN" context
s_mov_b64 s[6:7], exec
BB0_16:
v_readfirstlane_b32 s8, v28
v_cmp_eq_u32_e32 vcc, s8, v28
s_and_saveexec_b64 vcc, vcc
s_set_gpr_idx_on s8, gpr_idx(DST)
v_mov_b32_e32 v6, v25
s_set_gpr_idx_off
s_xor_b64 exec, exec, vcc
s_cbranch_execnz BB0_16
; %bb.17:
s_or_saveexec_b64 s[4:5], s[4:5] <-- exec mask is restored for "ELSE" but immediately overwritten.
s_mov_b64 exec, s[6:7]
The rest of the "ELSE" block is executed not by the workitems which constitute the "else mask" but by those which constitute "then mask"
SILowerControlFlow::emitElse always considers the basic block begin() as an insertion point for s_or_saveexec.
Proposed fix: The SI_INDIRECT_DST_V* procedure should split the reminder block to create landing pad for the EXEC restoration.
https://reviews.llvm.org/D75472
Files:
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
Index: llvm/lib/Target/AMDGPU/SIISelLowering.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -3308,9 +3308,15 @@
auto InsPt = emitLoadM0FromVGPRLoop(TII, MRI, MBB, *LoopBB, DL, *Idx,
InitResultReg, DstReg, PhiReg, TmpExec,
Offset, UseGPRIdxMode, IsIndirectSrc);
-
- MachineBasicBlock::iterator First = RemainderBB->begin();
- BuildMI(*RemainderBB, First, DL, TII->get(MovExecOpc), Exec)
+ MachineBasicBlock* LandingPad = MF->CreateMachineBasicBlock();
+ MachineFunction::iterator MBBI(LoopBB);
+ ++MBBI;
+ MF->insert(MBBI, LandingPad);
+ LoopBB->removeSuccessor(RemainderBB);
+ LandingPad->addSuccessor(RemainderBB);
+ LoopBB->addSuccessor(LandingPad);
+ MachineBasicBlock::iterator First = LandingPad->begin();
+ BuildMI(*LandingPad, First, DL, TII->get(MovExecOpc), Exec)
.addReg(SaveExec);
return InsPt;
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D75472.247686.patch
Type: text/x-patch
Size: 1028 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200302/c874b300/attachment.bin>
More information about the llvm-commits
mailing list