<div dir="ltr">Hi Justin,<div><br></div><div>Yes, I believe it is expected in some cases, and I did go over this change offline with some folks because I had the same doubt.</div><div>The issue is that this may change the order of blocks, which can affect codegen. Hence it's not entirely NFC.</div><div>For the above example, if you look at the IR change rather than the asm change, it's just two blocks getting reversed in the list.</div><div><br></div><div>The test as seen by "MergeBlocksIntoPred" is:</div><div><div>entry:</div><div>  br i1 undef, label %for.cond5.preheader, label %for.end52</div><div><br></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">for.cond5.preheader:</span><br></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">  br <span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">label %<span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">for.cond5</span></span></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></span></div><div>for.cond5:</div><div>  %or.cond = and i1 undef, false</div><div>  br i1 %or.cond, label %for.body33.preheader, label %for.cond5</div><div><br></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">for.body33.preheader:</span><br></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">  br label %<span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">for.body33</span></span></div><div><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div>for.cond30:</div><div>  br label %for.cond5</div><div><br></div><div>for.body33:</div><div>  %tobool = fcmp une double undef, 0.000000e+00</div><div>  br i1 %tobool, label %if.then, label %for.cond30</div><div><br></div><div>if.then:</div><div>  call void @scale()</div><div>  br label %for.cond30</div><div><br></div><div>for.end52:</div><div>  ret void</div></div><div><br></div><div><br></div><div>Now, calling MergeBlocksIntoPred, will attempt to merge for.body33.preheader and for.body33, which happen to have for.cond30 inbetween.<br></div><div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">for.body33.preheader:</span><br></div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">  br label %<span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">for.body33</span></span></div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">for.cond30:</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">  br label %for.cond5</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial"><br></div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">for.body33:</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">  %tobool = fcmp une double undef, 0.000000e+00</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">  br i1 %tobool, label %if.then, label %for.cond30</div><br></div><div>Before this patch, the two would be merged "down", so you delete the preheader and get:</div><div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">for.cond5:</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">  %or.cond = and i1 undef, false</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">  br i1 %or.cond, label %for.body33, label %for.cond5</div><br></div><div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">for.cond30:</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">  br label %for.cond5</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><br></div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">for.body33:</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">  %tobool = fcmp une double undef, 0.000000e+00</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">  br i1 %tobool, label %if.then, label %for.cond30</div><br></div><div>After the patch, they're merged "up", i.e. "into the predecessor", so you get:</div><div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">for.cond5:</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">  %or.cond = and i1 undef, false</div><div style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial">  br i1 %or.cond, label %for.body33.preheader, label %for.cond5</div><br></div><div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline">for.body33.preheader:</span></div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">  %tobool = fcmp une double undef, 0.000000e+00</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">  br i1 %tobool, label %if.then, label %for.cond30</div></span></span></div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><span style="background-color:rgb(255,255,255);text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><span style="text-decoration-style:initial;text-decoration-color:initial;float:none;display:inline"><br></span></span></div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">for.cond30:</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">  br label %for.cond5</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><br></div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)">So the CFG is identical, with a different label name. But the BBs will appear in a different order, so the layout changes. This can factor into codegen both good and bad.</div><div style="text-decoration-style:initial;text-decoration-color:initial;background-color:rgb(255,255,255)"><br></div>For "test/CodeGen/X86/hoist-spill.ll", this appeared to factor in a good way. The test checked:<br></div>2 x CHECK: mov{{.}} %{{.*}}, [[SPOFFSET1:-?[0-9]*]](%rsp)<div>2 x CHECK-NOT: mov{{.}} %{{.*}}, [[SPOFFSET2]](%rsp)</div><div>After the patch, there was no second movl with negative offset or any reloads. This should be good. The condition the test checks: "no spills to the same stack slot after hoisting." is true, but the code generated changed. Again, because there are blocks in between the merged blocks so the order changed. </div><div>Now, to satisfy what the test is supposed to check for, I moved below the empty block about to be merged, so two "movl minus offset" are still generated rather than just one and we can still check for two "movl offset" not existing.</div><div><br></div><div>Please let me know if this makes sense or I missed something.</div><div><br></div><div>Thanks,</div><div>Alina<br><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, Jun 21, 2018 at 3:18 PM Justin Bogner <<a href="mailto:mail@justinbogner.com">mail@justinbogner.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Alina Sbirlea via llvm-commits <<a href="mailto:llvm-commits@lists.llvm.org" target="_blank">llvm-commits@lists.llvm.org</a>> writes:<br>
> Author: asbirlea<br>
> Date: Wed Jun 20 15:01:04 2018<br>
> New Revision: 335183<br>
><br>
> URL: <a href="http://llvm.org/viewvc/llvm-project?rev=335183&view=rev" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project?rev=335183&view=rev</a><br>
> Log:<br>
> Generalize MergeBlockIntoPredecessor. Replace uses of MergeBasicBlockIntoOnlyPred.<br>
><br>
> Summary:<br>
> Two utils methods have essentially the same functionality. This is an attempt to merge them into one.<br>
> 1. lib/Transforms/Utils/Local.cpp : MergeBasicBlockIntoOnlyPred<br>
> 2. lib/Transforms/Utils/BasicBlockUtils.cpp : MergeBlockIntoPredecessor<br>
><br>
> Prior to the patch:<br>
> 1. MergeBasicBlockIntoOnlyPred<br>
> Updates either DomTree or DeferredDominance<br>
> Moves all instructions from Pred to BB, deletes Pred<br>
> Asserts BB has single predecessor<br>
> If address was taken, replace the block address with constant 1 (?)<br>
><br>
> 2. MergeBlockIntoPredecessor<br>
> Updates DomTree, LoopInfo and MemoryDependenceResults<br>
> Moves all instruction from BB to Pred, deletes BB<br>
> Returns if doesn't have a single predecessor<br>
> Returns if BB's address was taken<br>
><br>
> After the patch:<br>
> Method 2. MergeBlockIntoPredecessor is attempting to become the new default:<br>
> Updates DomTree or DeferredDominance, and LoopInfo and MemoryDependenceResults<br>
> Moves all instruction from BB to Pred, deletes BB<br>
> Returns if doesn't have a single predecessor<br>
> Returns if BB's address was taken<br>
><br>
> Uses of MergeBasicBlockIntoOnlyPred that need to be replaced:<br>
><br>
> 1. lib/Transforms/Scalar/LoopSimplifyCFG.cpp<br>
> Updated in this patch. No challenges.<br>
><br>
> 2. lib/CodeGen/CodeGenPrepare.cpp<br>
> Updated in this patch.<br>
>   i. eliminateFallThrough is straightforward, but I added using a temporary array to avoid the iterator invalidation.<br>
>   ii. eliminateMostlyEmptyBlock(s) methods also now use a temporary array for blocks<br>
> Some interesting aspects:<br>
>   - Since Pred is not deleted (BB is), the entry block does not need updating.<br>
>   - The entry block was being updated with the deleted block in eliminateMostlyEmptyBlock. Added assert to make obvious that BB=SinglePred.<br>
>   - isMergingEmptyBlockProfitable assumes BB is the one to be deleted.<br>
>   - eliminateMostlyEmptyBlock(BB) does not delete BB on one path, it deletes its unique predecessor instead.<br>
>   - adding some test owner as subscribers for the interesting tests modified:<br>
>     test/CodeGen/X86/avx-cmp.ll<br>
>     test/CodeGen/AMDGPU/nested-loop-conditions.ll<br>
>     test/CodeGen/AMDGPU/si-annotate-cf.ll<br>
>     test/CodeGen/X86/hoist-spill.ll<br>
>     test/CodeGen/X86/2006-11-17-IllegalMove.ll<br>
><br>
> 3. lib/Transforms/Scalar/JumpThreading.cpp<br>
> Not covered in this patch. It is the only use case using the DeferredDominance.<br>
> I would defer to Brian Rzycki to make this replacement.<br>
><br>
> Reviewers: chandlerc, spatel, davide, brzycki, bkramer, javed.absar<br>
><br>
> Subscribers: qcolombet, sanjoy, nemanjai, nhaehnle, jlebar, tpr, kbarton, RKSimon, wmi, arsenm, llvm-commits<br>
><br>
> Differential Revision: <a href="https://reviews.llvm.org/D48202" rel="noreferrer" target="_blank">https://reviews.llvm.org/D48202</a><br>
><br>
> Modified:<br>
>     llvm/trunk/include/llvm/Transforms/Utils/BasicBlockUtils.h<br>
>     llvm/trunk/lib/CodeGen/CodeGenPrepare.cpp<br>
>     llvm/trunk/lib/Transforms/Scalar/LoopSimplifyCFG.cpp<br>
>     llvm/trunk/lib/Transforms/Utils/BasicBlockUtils.cpp<br>
>     llvm/trunk/test/CodeGen/AMDGPU/branch-relaxation.ll<br>
>     llvm/trunk/test/CodeGen/AMDGPU/nested-loop-conditions.ll<br>
>     llvm/trunk/test/CodeGen/AMDGPU/si-annotate-cf.ll<br>
>     llvm/trunk/test/CodeGen/ARM/indirectbr.ll<br>
>     llvm/trunk/test/CodeGen/PowerPC/memCmpUsedInZeroEqualityComparison.ll<br>
>     llvm/trunk/test/CodeGen/PowerPC/memcmp-mergeexpand.ll<br>
>     llvm/trunk/test/CodeGen/PowerPC/ppc-shrink-wrapping.ll<br>
>     llvm/trunk/test/CodeGen/PowerPC/simplifyConstCmpToISEL.ll<br>
>     llvm/trunk/test/CodeGen/Thumb2/2010-11-22-EpilogueBug.ll<br>
>     llvm/trunk/test/CodeGen/Thumb2/thumb2-jtb.ll<br>
>     llvm/trunk/test/CodeGen/X86/2006-11-17-IllegalMove.ll<br>
>     llvm/trunk/test/CodeGen/X86/avx-cmp.ll<br>
>     llvm/trunk/test/CodeGen/X86/avx-splat.ll<br>
>     llvm/trunk/test/CodeGen/X86/avx2-vbroadcast.ll<br>
>     llvm/trunk/test/CodeGen/X86/avx512-i1test.ll<br>
>     llvm/trunk/test/CodeGen/X86/block-placement.ll<br>
>     llvm/trunk/test/CodeGen/X86/hoist-spill.ll<br>
>     llvm/trunk/test/CodeGen/X86/ins_subreg_coalesce-1.ll<br>
>     llvm/trunk/test/CodeGen/X86/memcmp-mergeexpand.ll<br>
>     llvm/trunk/test/CodeGen/X86/pr32108.ll<br>
>     llvm/trunk/test/CodeGen/X86/setcc-lowering.ll<br>
>     llvm/trunk/test/CodeGen/X86/split-store.ll<br>
>     llvm/trunk/test/CodeGen/X86/tail-dup-merge-loop-headers.ll<br>
>     llvm/trunk/test/DebugInfo/Generic/sunk-compare.ll<br>
>     llvm/trunk/test/Transforms/CodeGenPrepare/X86/computedgoto.ll<br>
>     llvm/trunk/test/Transforms/CodeGenPrepare/basic.ll<br>
>     llvm/trunk/test/Transforms/LoopSimplifyCFG/scev.ll<br>
>     llvm/trunk/test/Transforms/LoopStrengthReduce/ARM/2012-06-15-lsr-noaddrmode.ll<br>
>     llvm/trunk/test/Transforms/LoopStrengthReduce/X86/2012-01-13-phielim.ll<br>
>     llvm/trunk/test/Transforms/SimpleLoopUnswitch/trivial-unswitch-iteration.ll<br>
<br>
> Modified: llvm/trunk/test/CodeGen/X86/avx-cmp.ll<br>
> URL: <a href="http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-cmp.ll?rev=335183&r1=335182&r2=335183&view=diff" rel="noreferrer" target="_blank">http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-cmp.ll?rev=335183&r1=335182&r2=335183&view=diff</a><br>
> ==============================================================================<br>
> --- llvm/trunk/test/CodeGen/X86/avx-cmp.ll (original)<br>
> +++ llvm/trunk/test/CodeGen/X86/avx-cmp.ll Wed Jun 20 15:01:04 2018<br>
> @@ -26,12 +26,15 @@ declare void @scale() nounwind<br>
>  define void @render() nounwind {<br>
>  ; CHECK-LABEL: render:<br>
>  ; CHECK:       # %bb.0: # %entry<br>
> +; CHECK-NEXT:    pushq %rbp<br>
>  ; CHECK-NEXT:    pushq %rbx<br>
> +; CHECK-NEXT:    pushq %rax<br>
>  ; CHECK-NEXT:    xorl %eax, %eax<br>
>  ; CHECK-NEXT:    testb %al, %al<br>
>  ; CHECK-NEXT:    jne .LBB2_6<br>
>  ; CHECK-NEXT:  # %bb.1: # %for.cond5.preheader<br>
>  ; CHECK-NEXT:    xorl %ebx, %ebx<br>
> +; CHECK-NEXT:    movb $1, %bpl<br>
>  ; CHECK-NEXT:    jmp .LBB2_2<br>
>  ; CHECK-NEXT:    .p2align 4, 0x90<br>
>  ; CHECK-NEXT:  .LBB2_5: # %if.then<br>
> @@ -43,8 +46,8 @@ define void @render() nounwind {<br>
>  ; CHECK-NEXT:    jne .LBB2_2<br>
>  ; CHECK-NEXT:  # %bb.3: # %for.cond5<br>
>  ; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1<br>
> -; CHECK-NEXT:    testb %bl, %bl<br>
> -; CHECK-NEXT:    je .LBB2_2<br>
> +; CHECK-NEXT:    testb %bpl, %bpl<br>
> +; CHECK-NEXT:    jne .LBB2_2<br>
>  ; CHECK-NEXT:  # %bb.4: # %for.body33<br>
>  ; CHECK-NEXT:    # in Loop: Header=BB2_2 Depth=1<br>
>  ; CHECK-NEXT:    vucomisd {{\.LCPI.*}}, %xmm0<br>
> @@ -52,7 +55,9 @@ define void @render() nounwind {<br>
>  ; CHECK-NEXT:    jp .LBB2_5<br>
>  ; CHECK-NEXT:    jmp .LBB2_2<br>
>  ; CHECK-NEXT:  .LBB2_6: # %for.end52<br>
> +; CHECK-NEXT:    addq $8, %rsp<br>
>  ; CHECK-NEXT:    popq %rbx<br>
> +; CHECK-NEXT:    popq %rbp<br>
>  ; CHECK-NEXT:    retq<br>
>  entry:<br>
>    br i1 undef, label %for.cond5, label %for.end52<br>
<br>
This test change looks a bit suspicious - I would've expected your patch<br>
to be basically NFC, but here we end up using a few more registers. Is<br>
this expected?<br>
</blockquote></div>