[PATCH] D71192: AMDGPU: Fix AMDGPUUnifyDivergentExitNodes with no normal returns
Connor Abbott via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Jan 29 09:12:37 PST 2020
cwabbott updated this revision to Diff 241187.
cwabbott added a comment.
It seems I forgot to test this against the entire testsuite, which made a bunch
of buildbots unhappy. This version fixes the issues:
- Skip the entire thing for non-pixel shaders that only end with an infinite
loop. This was causing slightly different assembly for four other tests, due to
the insertion of the fake branch, and it generates extra work for normal
shaders that's pointless.
- Update the expected output in update-phi.ll to account for the null export.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D71192/new/
https://reviews.llvm.org/D71192
Files:
llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll
llvm/test/CodeGen/AMDGPU/update-phi.ll
Index: llvm/test/CodeGen/AMDGPU/update-phi.ll
===================================================================
--- llvm/test/CodeGen/AMDGPU/update-phi.ll
+++ llvm/test/CodeGen/AMDGPU/update-phi.ll
@@ -14,12 +14,13 @@
; IR-NEXT: [[DOT01:%.*]] = phi float [ 0.000000e+00, [[DOTLOOPEXIT]] ], [ [[N29:%.*]], [[TRANSITIONBLOCK:%.*]] ]
; IR-NEXT: [[N29]] = fadd float [[DOT01]], 1.000000e+00
; IR-NEXT: [[N30:%.*]] = fcmp ogt float [[N29]], 4.000000e+00
-; IR-NEXT: br i1 true, label [[TRANSITIONBLOCK]], label [[DUMMYRETURNBLOCK:%.*]]
+; IR-NEXT: br i1 true, label [[TRANSITIONBLOCK]], label [[UNIFIEDRETURNBLOCK:%.*]]
; IR: TransitionBlock:
; IR-NEXT: br i1 [[N30]], label [[DOTLOOPEXIT]], label [[N28]]
; IR: n31:
; IR-NEXT: ret void
-; IR: DummyReturnBlock:
+; IR: UnifiedReturnBlock:
+; IR-NEXT: call void @llvm.amdgcn.exp.f32(i32 9, i32 0, float undef, float undef, float undef, float undef, i1 true, i1 true)
; IR-NEXT: ret void
;
.entry:
Index: llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll
===================================================================
--- llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll
+++ llvm/test/CodeGen/AMDGPU/kill-infinite-loop.ll
@@ -45,6 +45,22 @@
ret void
}
+; test the case where there's only a kill in an infinite loop
+; CHECK-LABEL: only_kill
+; CHECK: exp null off, off, off, off done vm
+; CHECK-NEXT: s_endpgm
+; SIInsertSkips inserts an extra null export here, but it should be harmless.
+; CHECK: exp null off, off, off, off done vm
+; CHECK-NEXT: s_endpgm
+define amdgpu_ps void @only_kill() #0 {
+main_body:
+ br label %loop
+
+loop:
+ call void @llvm.amdgcn.kill(i1 false) #3
+ br label %loop
+}
+
; In case there's an epilog, we shouldn't have to do this.
; CHECK-LABEL: return_nonvoid
; CHECK-NOT: exp null off, off, off, off done vm
Index: llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
===================================================================
--- llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
+++ llvm/lib/Target/AMDGPU/AMDGPUUnifyDivergentExitNodes.cpp
@@ -195,7 +195,12 @@
bool AMDGPUUnifyDivergentExitNodes::runOnFunction(Function &F) {
auto &PDT = getAnalysis<PostDominatorTreeWrapperPass>().getPostDomTree();
- if (PDT.getRoots().size() <= 1)
+
+ // If there's only one exit, we don't need to do anything, unless this is a
+ // pixel shader and that exit is an infinite loop, since we still have to
+ // insert an export in that case.
+ if (PDT.getRoots().size() <= 1 &&
+ F.getCallingConv() != CallingConv::AMDGPU_PS)
return false;
LegacyDivergenceAnalysis &DA = getAnalysis<LegacyDivergenceAnalysis>();
@@ -321,7 +326,7 @@
if (ReturningBlocks.empty())
return false; // No blocks return
- if (ReturningBlocks.size() == 1)
+ if (ReturningBlocks.size() == 1 && !InsertExport)
return false; // Already has a single return block
const TargetTransformInfo &TTI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D71192.241187.patch
Type: text/x-patch
Size: 2970 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20200129/0a43dd25/attachment.bin>
More information about the llvm-commits
mailing list