[PATCH] D22092: AMDGPU: Reduce the duration of whole-quad-mode
Nicolai Hähnle via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 27 07:14:24 PDT 2016
nhaehnle added inline comments.
================
Comment at: lib/Target/AMDGPU/SIWholeQuadMode.cpp:228-231
@@ +227,6 @@
+
+ // Since we're in machine SSA, we do not need to track physical
+ // registers across basic blocks.
+ if (Value->isPHIDef())
+ continue;
+
----------------
arsenm wrote:
> This isn't actually true in the case of SCC. It's rare but possible to come up with test cases that break this. A uniform branch with a use of the same i1 value in a later block should do it if you want to try to break things
Hmm, I was going off some comments that I saw in one of the Live*.h analyses.
Do you have a sample of that? I tried a naive
define amdgpu_ps float @test(i32 inreg %a) nounwind {
entry:
%cc = icmp ugt i32 %a, 5
br i1 %cc, label %if, label %next
if:
br label %next
next:
%v = phi float [ 2.0, %if ], [ 0.0, %entry ]
br i1 %cc, label %if2, label %end
if2:
br label %end
end:
%r = phi float [ 3.0, %if2 ], [ %v, %next ]
ret float %r
}
and I don't get SCC reuse:
test: ; @test
; BB#0: ; %entry
v_mov_b32_e32 v0, 0
s_cmp_lt_u32 s0, 6
s_cbranch_scc1 BB0_2
; BB#1: ; %if
v_mov_b32_e32 v0, 2.0
BB0_2: ; %next
s_cmp_lt_u32 s0, 6
s_cbranch_scc1 BB0_4
; BB#3: ; %if2
v_mov_b32_e32 v0, 0x40400000
BB0_4: ; %end
; return
https://reviews.llvm.org/D22092
More information about the llvm-commits
mailing list