[PATCH] D140931: Improve and enable folding of conditional branches with tail calls.

Sun Jan 29 00:34:26 PST 2023

pengfei added a comment.

> Make it so that conditional tail calls can be emitted even when there are multiple predecessors.

Do you notice a test case to reflect this target? Otherwise we may need to add one for it.

================
Comment at: llvm/lib/CodeGen/BranchFolding.cpp:1513-1515
+      bool MayChangeDirection =
+          MF.getFunction().hasOptSize() ||
+          llvm::shouldOptimizeForSize(MBB, PSI, &MBBFreqInfo);
----------------
I think this flags turns to meaningless now. Either we leave code as was to match with SOM or we totally ignore it.

================
Comment at: llvm/lib/Target/X86/X86InstrInfo.cpp:3018-3019
     assert(BranchCond.size() == 1);
+    // At the moment MayChangeDirection is unused (or always treated as true).
+    // We only match existing CC.
     if (CC != BranchCond[0].getImm())
----------------
goldstein.w.n wrote:
> hans wrote:
> > goldstein.w.n wrote:
> > > goldstein.w.n wrote:
> > > > pengfei wrote:
> > > > > goldstein.w.n wrote:
> > > > > > pengfei wrote:
> > > > > > > goldstein.w.n wrote:
> > > > > > > > goldstein.w.n wrote:
> > > > > > > > > pengfei wrote:
> > > > > > > > > > goldstein.w.n wrote:
> > > > > > > > > > > pengfei wrote:
> > > > > > > > > > > > We do we need to add it if it's unused? Do not put unrelated/useless code in the same patch. It obstructs people to understand the patch.
> > > > > > > > > > > Its not needed. The reason I added it was b.c the rationale previously for keeping this optimization guarded behind `-Os` was that if it changed branch direction + reordered CFG it could be a de-optimization. That seems like a reasonable concern so thought it would be good to have in the generic API.
> > > > > > > > > > > 
> > > > > > > > > > > A more appropriate comment might have been: "At the moment never change branch direction".
> > > > > > > > > > > 
> > > > > > > > > > > Can drop if your prefer. Let me know for V2.
> > > > > > > > > > I'm still confused by the intention of this flag. If you believe it's a reasonable concern, why can we do it by default in this patch? The flag doesn't prevent from doing it.
> > > > > > > > > > OTOH, if we want make target specific decision, should we simply add the code
> > > > > > > > > > ```
> > > > > > > > > > if (MayChangeDirection)
> > > > > > > > > >   return false;
> > > > > > > > > > ```
> > > > > > > > > > to other targets (if they override it)? So we don't need to pass `MayChangeDirection` again to `replaceBranchWithTailCall`, right?
> > > > > > > > > > 
> > > > > > > > > > Regarding de-optimization, I had the same concern at the beginning. But test cases show that in most cases both branches are terminators, either jmp or ret. So the change of the layout should have negligible affect to performance?
> > > > > > > > > > I'm still confused by the intention of this flag. If you believe it's a reasonable concern, why can we do it by default in this patch? The flag doesn't prevent from doing it.
> > > > > > > > > 
> > > > > > > > > Oh man...
> > > > > > > > > 
> > > > > > > > > **The comment is wrong. We assume `MayChangeDirection` is `false`** (I.e we only find branch if CC matches, not also able to find matches inverse). Sorry for the confusion.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > OTOH, if we want make target specific decision, should we simply add the code
> > > > > > > > > > ```
> > > > > > > > > > if (MayChangeDirection)
> > > > > > > > > >   return false;
> > > > > > > > > > ```
> > > > > > > > > > to other targets (if they override it)? So we don't need to pass `MayChangeDirection` again to `replaceBranchWithTailCall`, right?
> > > > > > > > > >
> > > > > > > > > So at the moment `MayChangeDirection` is unused in both `X86InstrInfo::canMakeTailCallConditional` and `X86InstrInfo::replaceBranchWithTailCall` (assumed **false**). `X86InstrInfo::replaceBranchWithTailCall` needs to find the branch we are replacing which may rely on `MayChangeDirection`. Also we have the sanity tests:
> > > > > > > > > ```
> > > > > > > > > assert(canMakeTailCallConditional(BranchCond, TailCall, MayChangeDirection));
> > > > > > > > > ```
> > > > > > > > > sitting at the top.
> > > > > > > > > > Regarding de-optimization, I had the same concern at the beginning. But test cases show that in most cases both branches are terminators, either jmp or ret. So the change of the layout should have negligible affect to performance?
> > > > > > > > > 
> > > > > > > > > This is because the entire case is guarded behind:
> > > > > > > > > ```
> > > > > > > > >         // Only eliminate if MBB == TBB (Taken Basic Block)
> > > > > > > > >         if (PredAnalyzable && !PredCond.empty() && PredTBB == MBB &&
> > > > > > > > >             PredTBB != PredFBB) {
> > > > > > > > > ```
> > > > > > > > > So we really are only ever doing this opt if it won't cause CFG reordering already. The `MayChangeDirection` AFAICT is redundant in the current implementation. **Would you prefer me to just turn it into a comment at the call site?**
> > > > > > > > > 
> > > > > > > > > Note we do see the final condition switch but from what I can tell that is b.c eliminating this edge in the CFG affects future calls to `BranchFolder::OptimizeBlock`. At the time this transformation takes place it only matches the same taken condition.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > I'm still confused by the intention of this flag. If you believe it's a reasonable concern, why can we do it by default in this patch? The flag doesn't prevent from doing it.
> > > > > > > > > 
> > > > > > > > > Oh man...
> > > > > > > > > 
> > > > > > > > > **The comment is wrong. We assume `MayChangeDirection` is `false`** (I.e we only find branch if CC matches, not also able to find matches inverse). Sorry for the confusion.
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > I fixed the wrong comment in V2. Let me know your decision about dropping `MayChangeDirection` entirely for V3.
> > > > > > > > > 
> > > > > > > > > > OTOH, if we want make target specific decision, should we simply add the code
> > > > > > > > > > ```
> > > > > > > > > > if (MayChangeDirection)
> > > > > > > > > >   return false;
> > > > > > > > > > ```
> > > > > > > > > > to other targets (if they override it)? So we don't need to pass `MayChangeDirection` again to `replaceBranchWithTailCall`, right?
> > > > > > > > > >
> > > > > > > > > So at the moment `MayChangeDirection` is unused in both `X86InstrInfo::canMakeTailCallConditional` and `X86InstrInfo::replaceBranchWithTailCall` (assumed **false**). `X86InstrInfo::replaceBranchWithTailCall` needs to find the branch we are replacing which may rely on `MayChangeDirection`. Also we have the sanity tests:
> > > > > > > > > ```
> > > > > > > > > assert(canMakeTailCallConditional(BranchCond, TailCall, MayChangeDirection));
> > > > > > > > > ```
> > > > > > > > > sitting at the top.
> > > > > > > > > > Regarding de-optimization, I had the same concern at the beginning. But test cases show that in most cases both branches are terminators, either jmp or ret. So the change of the layout should have negligible affect to performance?
> > > > > > > > > 
> > > > > > > > > This is because the entire case is guarded behind:
> > > > > > > > > ```
> > > > > > > > >         // Only eliminate if MBB == TBB (Taken Basic Block)
> > > > > > > > >         if (PredAnalyzable && !PredCond.empty() && PredTBB == MBB &&
> > > > > > > > >             PredTBB != PredFBB) {
> > > > > > > > > ```
> > > > > > > > > So we really are only ever doing this opt if it won't cause CFG reordering already. The `MayChangeDirection` AFAICT is redundant in the current implementation. **Would you prefer me to just turn it into a comment at the call site?**
> > > > > > > > > 
> > > > > > > > > Note we do see the final condition switch but from what I can tell that is b.c eliminating this edge in the CFG affects future calls to `BranchFolder::OptimizeBlock`. At the time this transformation takes place it only matches the same taken condition.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > Thanks! The flag is clear to me now. But it's still not clear to me why it is conservative.
> > > > > > > According to the old comments in `BranchFolding.cpp`, the true conservative action is to do `replaceBranchWithTailCall` only when `OptForSize` or `MayChangeDirection` is `true`. But this change always does `replaceBranchWithTailCall` now. So it looks to me like aggressive rather than conservative.
> > > > > > > Thanks! The flag is clear to me now. But it's still not clear to me why it is conservative.
> > > > > > > According to the old comments in `BranchFolding.cpp`, the true conservative action is to do `replaceBranchWithTailCall` only when `OptForSize` or `MayChangeDirection` is `true`. But this change always does `replaceBranchWithTailCall` now. So it looks to me like aggressive rather than conservative.
> > > > > > 
> > > > > > Thats not how I read it. How I read it was only do `replaceBranchWithTailCall` when `OptForSize` because we may change branch direction and mess up other CFG optimizations. AFAICT from the code, however, we never actually change branch direction (at the moment). I added the flag so that in the future if someone wants to make this more aggressive, the original intent, to not change branch direction unless `OptForSize` it would be clear.
> > > > > Seems we have opposite understanding in the sentences. cc @hans who wrote it in D29856.
> > > > > My understanding is the "branch direction" means the BB layout that the BPU may take it as taken or not taken. The layout is usually calculated according to the branch possibility. Changing the order may regress performance.
> > > > > Seems we have opposite understanding in the sentences. cc @hans who wrote it in D29856.
> > > > > My understanding is the "branch direction" means the BB layout that the BPU may take it as taken or not taken. The layout is usually calculated according to the branch possibility. Changing the order may regress performance.
> > > > 
> > > > That can't change either.
> > > > ```
> > > >        // Only eliminate if MBB == TBB (Taken Basic Block)
> > > >         if (PredAnalyzable && !PredCond.empty() && PredTBB == MBB &&
> > > >             PredTBB != PredFBB) {
> > > > ```
> > > > I think it defacto ends up changing things later on b.c once an MBB has no preds it gets killed and that can re-order (more opportunistic optimization than forced changes) but we never explicitly swap fallthrough/taken. (Just as an reference, `bolt` now does this unconditionally in `SimplifyConditionalTailCalls::fixTailCalls`)
> > > > Seems we have opposite understanding in the sentences. cc @hans who wrote it in D29856.
> > > > My understanding is the "branch direction" means the BB layout that the BPU may take it as taken or not taken. The layout is usually calculated according to the branch possibility. Changing the order may regress performance.
> > > 
> > > The comment at the end:
> > > ```
> > >       // If the predecessor is falling through to this block, we could reverse
> > >         // the branch condition and fold the tail call into that. However, after
> > >         // that we might have to re-arrange the CFG to fall through to the other
> > >         // block and there is a high risk of regressing code size rather than
> > >         // improving it.
> > > ```
> > > 
> > > I think is more in line with what you're saying.
> > I didn't look at the current patch so I don't have the full context, but when I wrote that comment, it was referring to branch direction as in whether it's branching to a higher or lower address. For example changing "Jcc foo; foo: jmp bar;" into "Jcc bar;" might be changing Jcc from a forwards branch (to foo) to a backwards branch to bar (if it's at a lower address).
> > 
> > Someone had expressed concerns about this, which is why we limited it to optsize functions.
> > I didn't look at the current patch so I don't have the full context, but when I wrote that comment, it was referring to branch direction as in whether it's branching to a higher or lower address. For example changing "Jcc foo; foo: jmp bar;" into "Jcc bar;" might be changing Jcc from a forwards branch (to foo) to a backwards branch to bar (if it's at a lower address).
> > 
> > Someone had expressed concerns about this, which is why we limited it to optsize functions.
> 
> What was the rationale? Static prediction? If so we can could probably limit this to `haswell` and never where static prediction was changed.
> 
Yeah, it looks like `Assembly/Compiler Coding Rule 3` in SOM. It suggests to maintain for consistency although Core microarchitecture does not use the static prediction heuristic.

================
Comment at: llvm/test/CodeGen/X86/tail-opts.ll:866
 ; CHECK-NEXT:    cmpl $128, %edi
-; CHECK-NEXT:    jne .LBB14_8
-; CHECK-NEXT:  # %bb.5: # %return
+; CHECK-NEXT:    jne tail_call_me # TAILCALL
+; CHECK-NEXT:  # %bb.7: # %return
----------------
This looks like increasing size some time. Is this caused by the change of checking for all predecessors?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140931/new/

https://reviews.llvm.org/D140931