[PATCH] D68092: [AMDGPU] Invert the handling of skip insertion.

Mon Sep 30 10:43:05 PDT 2019

nhaehnle added a comment.

Thank you for working on this.

There are a bunch of high-level problems with this code which are really due to its history and not due to your changes, but I'd appreciate if we could get things right now as the code is rewritten. Mostly, the decision logic is overly complex and at least partially wrong because it makes assumptions about the order of basic blocks (i..e, the outer loop of `shouldRetainSkips`).

Let's rethink what the condition should be. Here's an attempt: s_cbranch_execz should be removed if

- Not taking the branch when EXEC=0 will end up falling through to the branch target anyway.[0]
- The fall-through code sequence has no unwanted side effects
- The fall-through code sequence is short and cheap[1]
- Heuristically, the cheapness requirement should arguably mean that the fall-through code sequence contains no branches at all. After all, if we're going to hit another branch anyway, we may as well just take the original EXECZ branch. This also simplifies the check that we correctly fall through to the branch target. (An exception could perhaps be made for a nested s_cbranch_execz, if the overall code sequence is short and cheap enough that it makes sense to remove both of them.)

How does that sound? This is a significant conceptual change to how the pass works, but I think it's for the better.

[0] There is an interesting subtle point here. The way we currently lower the original thread-based control flow, this fall-through property is always guaranteed except for subtleties involving VCC[Z] branches in loops. However, perhaps we won't always lower control flow in the same way, and perhaps we will one day generate code which doesn't have this property.

[1] What we really want is that `cost of fall-through code < p_taken * (cost of taken branch) + p_nottaken (cost of not-taken branch + cost of fall-through code)`.

================
Comment at: lib/Target/AMDGPU/SILowerControlFlow.cpp:247-248

   // Insert a pseudo terminator to help keep the verifier happy. This will also
   // be used later when inserting skips.
+  MachineInstr *NewBr = BuildMI(MBB, I, DL, TII->get(AMDGPU::S_CBRANCH_EXECZ))
----------------
Comment needs to be updated.

================
Comment at: lib/Target/AMDGPU/SIRemoveShortExecBranches.cpp:10-12
+/// This pass removes the unwanted exec mask instructions inserted during
+/// SILowerControlFlow. It removes the exec masks for the short branches and
+/// tries to retain it for the long branches.
----------------
This should say that it removes EXECZ branches for short branches, right?

Also... "try to retain"? I do hope we always succeed at that :)

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68092/new/

https://reviews.llvm.org/D68092