[PATCH] D22317: Codegen: Tail Merge: Be less aggressive with special cases.

Wed Jul 13 16:32:09 PDT 2016

davidxl added inline comments.

================
Comment at: lib/CodeGen/BranchFolding.cpp:621
@@ -621,1 +620,3 @@
+  // when SuccBB is the only successor.
+  if ((MBB1 == PredBB || MBB2 == PredBB) && MBB1->succ_size() == 1) {
     MachineBasicBlock::iterator I;
----------------
iteratee wrote:
> davidxl wrote:
> > davidxl wrote:
> > > This looks fine but should it be done only in layout mode?
> > looks like if you merge this patch back to D18226 with a slight modification like this:
> > 
> >   if ((MBB1 == PredBB || MBB2 == PredBB) && (MBB1->succ_size() == 1 || !AfterPlacement)) {
> > 
> > }
> > 
> > D18226 will be good to go in .
> No, I don't think it should be only layout mode.
> rdf-copy is a good example of why. Without this patch:
> 
> 
>     foo:                                    // @foo
>     // BB#0:                                // %entry
>             {
>                             r1 = #0 ; jump .LBB0_2
>         }
>              .p2align        4
>      .LBB0_1:                                // %while.cond
>                                              //   in Loop: Header=BB0_2 Depth=1
>              {
>                              r1 = r0
>                              r0 = memw(r0 + #0)
>              }
>      .LBB0_2:                                // %while.cond
>                                              // =>This Inner Loop Header: Depth=1
>              {
>                              if (p0.new) r0 = r1
>                              p0 = cmp.eq(r0, #0)
>                              if (p0.new) jumpr:nt r31
>              }
>              {
>                              jump .LBB0_1
>              }
> 
> Note the loop is 3 packets long. Now with the patch:
>      foo:                                    // @foo
>      // BB#0:                                // %entry
>              {
>                              r1 = #0
>                              p0 = cmp.eq(r0, #0)
>              }
>              {
>                              if (p0) r0 = r1
>                              if (p0) jumpr:nt r31
>              }
>              .p2align        4
>      .LBB0_1:                                // %while.cond
>                                              // =>This Inner Loop Header: Depth=1
>              {
>                              r1 = r0
>                              r0 = memw(r0 + #0)
>                              if (!cmp.eq(r0.new, #0)) jump:t .LBB0_1
>              }
>      // BB#2:                                // %if.end
>              {
>                              r0 = r1
>                              jumpr r31
>              }
> Now the loop is one packet long.
> There are other cases where unanalyzable fallthrough prevents duplication, but we avoid the problem completely by not merging.
But this fix is over-generalized 

The root cause of the problem in that test is that one of tail sequence is in a loop while the other is not. I think this can only happen for loop top-testing and bottom testing, so it seems to me a better general fix is to check if MBB1 or MBB2 dominates the other.


Repository:
  rL LLVM

http://reviews.llvm.org/D22317