[PATCH] D133840: AMDGPU: Add a pass to rewrite certain undef in PHI

Thu Sep 15 00:17:55 PDT 2022

ruiling added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp:122
+            // Update DominateBB if necessary.
+            if (DT->dominates(IncomingBB, DominateBB))
+              DominateBB = IncomingBB;
----------------
sameerds wrote:
> ruiling wrote:
> > sameerds wrote:
> > > If this is not true, then does DominateBB dominate IncomingBB?
> > No, this is used to handle case like with_uniform_region_inside added in this change. If IncomingBB does not dominate DominateBB, they may be successors of the same block in case their common predecessor has uniform branch.
> My confusion is that the "all_of" check later in the code only ensures that DominateBB dominates all the undefined values. But how does one prove that UniqueDefinedIncoming dominates all the uses of the PHI? 
I think this should be true based on my understanding of structurized CFG (For a block, if it has a predecessor with divergent terminator which dominates any other non-backedge predecessor, then the predecessor should dominate the block itself.). an assertion here should work. An extra check "DominateBB dominates BB" should work well, which ensures we are doing the transformation correctly.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPURewriteUndefForPHI.cpp:34
+// For pattern A, by reporting %phi as uniform, the later pipeline need to make
+// sure it be handled correctly. The backend usually allocates a scalar register
+// and if any thread in a wave takes %then path, the scalar register will get
----------------
sameerds wrote:
> Does "usually" mean that sometimes the backend will not allocate a scalar? In the case that it is allocated a vector register, then the generated code will be wrong?
My simple thinking is in case the backend choose to allocate a vector register, the backend should make sure this is correctly propagated, all the further uses in the def-use chain should get vector register allocated, the threads with undefined value might generate garbage in corresponding register lanes. But that is still correct.

================
Comment at: llvm/test/CodeGen/AMDGPU/rewrite-undef-for-phi.ll:63
+end:
+  %c2 = phi float [ undef, %bb2 ], [ %c, %bb3 ], [ undef, %bb4 ], [ %c, %entry ]
+  ret float %c2
----------------
sameerds wrote:
> Since DominateBB is only checked on defined values, here it is true that %entry dominates %bb3. Depending on how we traverse the incoming vaues, if DominateBB is initially %bb3, then later when IncomingBB is %entry, DominateBB does not dominate IncomingBB, but it is true in the opposite direction.
I am not sue whether you have further question here. Since we are checking for pattern B described in the cpp file. But we might have complex single-entry-single-exit region inside an if-then structure which are skipped during structurization. If we are processing a pattern B in the IR, then the DominateBB should be the dominator of BB.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D133840/new/

https://reviews.llvm.org/D133840