[PATCH] D22092: AMDGPU: Reduce the duration of whole-quad-mode

Wed Jul 27 07:14:24 PDT 2016

nhaehnle added inline comments.

================
Comment at: lib/Target/AMDGPU/SIWholeQuadMode.cpp:228-231
@@ +227,6 @@
+
+        // Since we're in machine SSA, we do not need to track physical
+        // registers across basic blocks.
+        if (Value->isPHIDef())
+          continue;
+
----------------
arsenm wrote:
> This isn't actually true in the case of SCC. It's rare but possible to come up with test cases that break this. A uniform branch with a use of the same i1 value in a later block should do it if you want to try to break things
Hmm, I was going off some comments that I saw in one of the Live*.h analyses.

Do you have a sample of that? I tried a naive

  define amdgpu_ps float @test(i32 inreg %a) nounwind {
  entry:
    %cc = icmp ugt i32 %a, 5
    br i1 %cc, label %if, label %next
  if:
    br label %next
  next:
    %v = phi float [ 2.0, %if ], [ 0.0, %entry ]
    br i1 %cc, label %if2, label %end
  if2:
    br label %end
  end:
    %r = phi float [ 3.0, %if2 ], [ %v, %next ]
    ret float %r
  }

and I don't get SCC reuse:

  test:                                   ; @test
  ; BB#0:                                 ; %entry
        v_mov_b32_e32 v0, 0
        s_cmp_lt_u32 s0, 6
        s_cbranch_scc1 BB0_2
  ; BB#1:                                 ; %if
        v_mov_b32_e32 v0, 2.0
  BB0_2:                                  ; %next
        s_cmp_lt_u32 s0, 6
        s_cbranch_scc1 BB0_4
  ; BB#3:                                 ; %if2
        v_mov_b32_e32 v0, 0x40400000
  BB0_4:                                  ; %end
        ; return

https://reviews.llvm.org/D22092