[PATCH] D22317: Codegen: Tail Merge: Be less aggressive with special cases.

Tue Aug 9 17:22:15 PDT 2016

It is safer to still guard the change with post-placement check. With
that I think it can go in as is (without needing to be folded to
D18226)

David

On Tue, Aug 9, 2016 at 11:19 AM, Kyle Butt <kyle+llvm at iteratee.net> wrote:
> iteratee added inline comments.
>
> ================
> Comment at: lib/CodeGen/BranchFolding.cpp:621
> @@ -621,1 +620,3 @@
> +  // when SuccBB is the only successor.
> +  if ((MBB1 == PredBB || MBB2 == PredBB) && MBB1->succ_size() == 1) {
>      MachineBasicBlock::iterator I;
> ----------------
> davidxl wrote:
>> iteratee wrote:
>> > davidxl wrote:
>> > > davidxl wrote:
>> > > > This looks fine but should it be done only in layout mode?
>> > > looks like if you merge this patch back to D18226 with a slight modification like this:
>> > >
>> > >   if ((MBB1 == PredBB || MBB2 == PredBB) && (MBB1->succ_size() == 1 || !AfterPlacement)) {
>> > >
>> > > }
>> > >
>> > > D18226 will be good to go in .
>> > No, I don't think it should be only layout mode.
>> > rdf-copy is a good example of why. Without this patch:
>> >
>> >
>> >     foo:                                    // @foo
>> >     // BB#0:                                // %entry
>> >             {
>> >                             r1 = #0 ; jump .LBB0_2
>> >         }
>> >              .p2align        4
>> >      .LBB0_1:                                // %while.cond
>> >                                              //   in Loop: Header=BB0_2 Depth=1
>> >              {
>> >                              r1 = r0
>> >                              r0 = memw(r0 + #0)
>> >              }
>> >      .LBB0_2:                                // %while.cond
>> >                                              // =>This Inner Loop Header: Depth=1
>> >              {
>> >                              if (p0.new) r0 = r1
>> >                              p0 = cmp.eq(r0, #0)
>> >                              if (p0.new) jumpr:nt r31
>> >              }
>> >              {
>> >                              jump .LBB0_1
>> >              }
>> >
>> > Note the loop is 3 packets long. Now with the patch:
>> >      foo:                                    // @foo
>> >      // BB#0:                                // %entry
>> >              {
>> >                              r1 = #0
>> >                              p0 = cmp.eq(r0, #0)
>> >              }
>> >              {
>> >                              if (p0) r0 = r1
>> >                              if (p0) jumpr:nt r31
>> >              }
>> >              .p2align        4
>> >      .LBB0_1:                                // %while.cond
>> >                                              // =>This Inner Loop Header: Depth=1
>> >              {
>> >                              r1 = r0
>> >                              r0 = memw(r0 + #0)
>> >                              if (!cmp.eq(r0.new, #0)) jump:t .LBB0_1
>> >              }
>> >      // BB#2:                                // %if.end
>> >              {
>> >                              r0 = r1
>> >                              jumpr r31
>> >              }
>> > Now the loop is one packet long.
>> > There are other cases where unanalyzable fallthrough prevents duplication, but we avoid the problem completely by not merging.
>> But this fix is over-generalized
>>
>> The root cause of the problem in that test is that one of tail sequence is in a loop while the other is not. I think this can only happen for loop top-testing and bottom testing, so it seems to me a better general fix is to check if MBB1 or MBB2 dominates the other.
> The forked diamond code is about to go in.
>
> I still think that these changes are correct in general, but if you still disagree, I can drop this and roll it into D18226 only after placement.
> I think the thresholds there need to be adjusted so that there is only one, but I can do that as well.
>
>
> Repository:
>   rL LLVM
>
> https://reviews.llvm.org/D22317
>
>
>