[llvm] r335183 - Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.

Mon Jun 25 15:32:17 PDT 2018

Alina Sbirlea <asbirlea at google.com> writes:
> Hi Justin,
>
> Yes, I believe it is expected in some cases, and I did go over this change
> offline with some folks because I had the same doubt.
> The issue is that this may change the order of blocks, which can affect
> codegen. Hence it's not entirely NFC.

Thanks for the explanation. This makes sense to me, and it looks like
while it does cause worse codegen in some cases, it causes a similar
amount of improvements.

> For the above example, if you look at the IR change rather than the asm
> change, it's just two blocks getting reversed in the list.
>
> The test as seen by "MergeBlocksIntoPred" is:
> entry:
>   br i1 undef, label %for.cond5.preheader, label %for.end52
>
> for.cond5.preheader:
>   br label %for.cond5
>
> for.cond5:
>   %or.cond = and i1 undef, false
>   br i1 %or.cond, label %for.body33.preheader, label %for.cond5
>
> for.body33.preheader:
>   br label %for.body33
>
> for.cond30:
>   br label %for.cond5
>
> for.body33:
>   %tobool = fcmp une double undef, 0.000000e+00
>   br i1 %tobool, label %if.then, label %for.cond30
>
> if.then:
>   call void @scale()
>   br label %for.cond30
>
> for.end52:
>   ret void
>
>
> Now, calling MergeBlocksIntoPred, will attempt to merge
> for.body33.preheader and for.body33, which happen to have for.cond30
> inbetween.
> for.body33.preheader:
>   br label %for.body33
>
> for.cond30:
>   br label %for.cond5
>
> for.body33:
>   %tobool = fcmp une double undef, 0.000000e+00
>   br i1 %tobool, label %if.then, label %for.cond30
>
> Before this patch, the two would be merged "down", so you delete the
> preheader and get:
> for.cond5:
>   %or.cond = and i1 undef, false
>   br i1 %or.cond, label %for.body33, label %for.cond5
>
> for.cond30:
>   br label %for.cond5
>
> for.body33:
>   %tobool = fcmp une double undef, 0.000000e+00
>   br i1 %tobool, label %if.then, label %for.cond30
>
> After the patch, they're merged "up", i.e. "into the predecessor", so you
> get:
> for.cond5:
>   %or.cond = and i1 undef, false
>   br i1 %or.cond, label %for.body33.preheader, label %for.cond5
>
> for.body33.preheader:
>   %tobool = fcmp une double undef, 0.000000e+00
>   br i1 %tobool, label %if.then, label %for.cond30
>
> for.cond30:
>   br label %for.cond5
>
> So the CFG is identical, with a different label name. But the BBs will
> appear in a different order, so the layout changes. This can factor into
> codegen both good and bad.
>
> For "test/CodeGen/X86/hoist-spill.ll", this appeared to factor in a good
> way. The test checked:
> 2 x CHECK: mov{{.}} %{{.*}}, [[SPOFFSET1:-?[0-9]*]](%rsp)
> 2 x CHECK-NOT: mov{{.}} %{{.*}}, [[SPOFFSET2]](%rsp)
> After the patch, there was no second movl with negative offset or any
> reloads. This should be good. The condition the test checks: "no spills to
> the same stack slot after hoisting." is true, but the code generated
> changed. Again, because there are blocks in between the merged blocks so
> the order changed.
> Now, to satisfy what the test is supposed to check for, I moved below the
> empty block about to be merged, so two "movl minus offset" are still
> generated rather than just one and we can still check for two "movl offset"
> not existing.
>
> Please let me know if this makes sense or I missed something.
>
> Thanks,
> Alina
>
>
> On Thu, Jun 21, 2018 at 3:18 PM Justin Bogner <mail at justinbogner.com> wrote:
>
>> Alina Sbirlea via llvm-commits <llvm-commits at lists.llvm.org> writes:
>> > Author: asbirlea
>> > Date: Wed Jun 20 15:01:04 2018
>> > New Revision: 335183
>> >
>> > URL: http://llvm.org/viewvc/llvm-project?rev=335183&view=rev
>> > Log:
>> > Generalize MergeBlockIntoPredecessor. Replace uses of
>> MergeBasicBlockIntoOnlyPred.
>> >
>> > Summary:
>> > Two utils methods have essentially the same functionality. This is an
>> attempt to merge them into one.
>> > 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred
>> > 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor
>> >
>> > Prior to the patch:
>> > 1. MergeBasicBlockIntoOnlyPred
>> > Updates either DomTree or DeferredDominance
>> > Moves all instructions from Pred to BB, deletes Pred
>> > Asserts BB has single predecessor
>> > If address was taken, replace the block address with constant 1 (?)
>> >
>> > 2. MergeBlockIntoPredecessor
>> > Updates DomTree, LoopInfo and MemoryDependenceResults
>> > Moves all instruction from BB to Pred, deletes BB
>> > Returns if doesn't have a single predecessor
>> > Returns if BB's address was taken
>> >
>> > After the patch:
>> > Method 2. MergeBlockIntoPredecessor is attempting to become the new
>> default:
>> > Updates DomTree or DeferredDominance, and LoopInfo and
>> MemoryDependenceResults
>> > Moves all instruction from BB to Pred, deletes BB
>> > Returns if doesn't have a single predecessor
>> > Returns if BB's address was taken
>> >
>> > Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:
>> >
>> > 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp
>> > Updated in this patch. No challenges.
>> >
>> > 2. lib/CodeGen/CodeGenPrepare.cpp
>> > Updated in this patch.
>> >   i. eliminateFallThrough is straightforward, but I added using a
>> temporary array to avoid the iterator invalidation.
>> >   ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary
>> array for blocks
>> > Some interesting aspects:
>> >   - Since Pred is not deleted (BB is), the entry block does not need
>> updating.
>> >   - The entry block was being updated with the deleted block in
>> eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.
>> >   - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.
>> >   - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it
>> deletes its unique predecessor instead.
>> >   - adding some test owner as subscribers for the interesting tests
>> modified:
>> >     test/CodeGen/X86/avx-cmp.ll
>> >     test/CodeGen/AMDGPU/nested-loop-conditions.ll
>> >     test/CodeGen/AMDGPU/si-annotate-cf.ll
>> >     test/CodeGen/X86/hoist-spill.ll
>> >     test/CodeGen/X86/2006-11-17-IllegalMove.ll
>> >
>> > 3. lib/Transforms/Scalar/JumpThreading.cpp
>> > Not covered in this patch. It is the only use case using the
>> DeferredDominance.
>> > I would defer to Brian Rzycki to make this replacement.
>> >
>> > Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar
>> >
>> > Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr,
>> kbarton, RKSimon, wmi, arsenm, llvm-commits
>> >
>> > Differential Revision: https://reviews.llvm.org/D48202
>> >
>> > Modified:
>> >     llvm/trunk/include/llvm/Transforms/Utils/BasicBlockUtils.h
>> >     llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp
>> >     llvm/trunk/lib/Transforms/Scalar/LoopSimplifyCFG.cpp
>> >     llvm/trunk/lib/Transforms/Utils/BasicBlockUtils.cpp
>> >     llvm/trunk/test/CodeGen/AMDGPU/branch-relaxation.ll
>> >     llvm/trunk/test/CodeGen/AMDGPU/nested-loop-conditions.ll
>> >     llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll
>> >     llvm/trunk/test/CodeGen/ARM/indirectbr.ll
>> >     llvm/trunk/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll
>> >     llvm/trunk/test/CodeGen/PowerPC/memcmp-mergeexpand.ll
>> >     llvm/trunk/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll
>> >     llvm/trunk/test/CodeGen/PowerPC/simplifyConstCmpToISEL.ll
>> >     llvm/trunk/test/CodeGen/Thumb2/2010-11-22-EpilogueBug.ll
>> >     llvm/trunk/test/CodeGen/Thumb2/thumb2-jtb.ll
>> >     llvm/trunk/test/CodeGen/X86/2006-11-17-IllegalMove.ll
>> >     llvm/trunk/test/CodeGen/X86/avx-cmp.ll
>> >     llvm/trunk/test/CodeGen/X86/avx-splat.ll
>> >     llvm/trunk/test/CodeGen/X86/avx2-vbroadcast.ll
>> >     llvm/trunk/test/CodeGen/X86/avx512-i1test.ll
>> >     llvm/trunk/test/CodeGen/X86/block-placement.ll
>> >     llvm/trunk/test/CodeGen/X86/hoist-spill.ll
>> >     llvm/trunk/test/CodeGen/X86/ins_subreg_coalesce-1.ll
>> >     llvm/trunk/test/CodeGen/X86/memcmp-mergeexpand.ll
>> >     llvm/trunk/test/CodeGen/X86/pr32108.ll
>> >     llvm/trunk/test/CodeGen/X86/setcc-lowering.ll
>> >     llvm/trunk/test/CodeGen/X86/split-store.ll
>> >     llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll
>> >     llvm/trunk/test/DebugInfo/Generic/sunk-compare.ll
>> >     llvm/trunk/test/Transforms/CodeGenPrepare/X86/computedgoto.ll
>> >     llvm/trunk/test/Transforms/CodeGenPrepare/basic.ll
>> >     llvm/trunk/test/Transforms/LoopSimplifyCFG/scev.ll
>> >
>>  llvm/trunk/test/Transforms/LoopStrengthReduce/ARM/2012-06-15-lsr-noaddrmode.ll
>> >
>>  llvm/trunk/test/Transforms/LoopStrengthReduce/X86/2012-01-13-phielim.ll
>> >
>>  llvm/trunk/test/Transforms/SimpleLoopUnswitch/trivial-unswitch-iteration.ll
>>
>> > Modified: llvm/trunk/test/CodeGen/X86/avx-cmp.ll
>> > URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-cmp.ll?rev=335183&r1=335182&r2=335183&view=diff
>> >
>> ==============================================================================
>> > --- llvm/trunk/test/CodeGen/X86/avx-cmp.ll (original)
>> > +++ llvm/trunk/test/CodeGen/X86/avx-cmp.ll Wed Jun 20 15:01:04 2018
>> > @@ -26,12 +26,15 @@ declare void @scale() nounwind
>> >  define void @render() nounwind {
>> >  ; CHECK-LABEL: render:
>> >  ; CHECK:       # %bb.0: # %entry
>> > +; CHECK-NEXT:    pushq %rbp
>> >  ; CHECK-NEXT:    pushq %rbx
>> > +; CHECK-NEXT:    pushq %rax
>> >  ; CHECK-NEXT:    xorl %eax, %eax
>> >  ; CHECK-NEXT:    testb %al, %al
>> >  ; CHECK-NEXT:    jne .LBB2_6
>> >  ; CHECK-NEXT:  # %bb.1: # %for.cond5.preheader
>> >  ; CHECK-NEXT:    xorl %ebx, %ebx
>> > +; CHECK-NEXT:    movb $1, %bpl
>> >  ; CHECK-NEXT:    jmp .LBB2_2
>> >  ; CHECK-NEXT:    .p2align 4, 0x90
>> >  ; CHECK-NEXT:  .LBB2_5: # %if.then
>> > @@ -43,8 +46,8 @@ define void @render() nounwind {
>> >  ; CHECK-NEXT:    jne .LBB2_2
>> >  ; CHECK-NEXT:  # %bb.3: # %for.cond5
>> >  ; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
>> > -; CHECK-NEXT:    testb %bl, %bl
>> > -; CHECK-NEXT:    je .LBB2_2
>> > +; CHECK-NEXT:    testb %bpl, %bpl
>> > +; CHECK-NEXT:    jne .LBB2_2
>> >  ; CHECK-NEXT:  # %bb.4: # %for.body33
>> >  ; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1
>> >  ; CHECK-NEXT:    vucomisd {{\.LCPI.*}}, %xmm0
>> > @@ -52,7 +55,9 @@ define void @render() nounwind {
>> >  ; CHECK-NEXT:    jp .LBB2_5
>> >  ; CHECK-NEXT:    jmp .LBB2_2
>> >  ; CHECK-NEXT:  .LBB2_6: # %for.end52
>> > +; CHECK-NEXT:    addq $8, %rsp
>> >  ; CHECK-NEXT:    popq %rbx
>> > +; CHECK-NEXT:    popq %rbp
>> >  ; CHECK-NEXT:    retq
>> >  entry:
>> >    br i1 undef, label %for.cond5, label %for.end52
>>
>> This test change looks a bit suspicious - I would've expected your patch
>> to be basically NFC, but here we end up using a few more registers. Is
>> this expected?
>>