[llvm] r231238 - [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind

Xinliang David Li xinliangli at gmail.com
Wed Mar 4 10:46:20 PST 2015


I am not sure the new layout strategy will work well consistently.
1) if 'if (a1())' and 'if (a2())' branches are equally likely, the new
layout does not actually improve icache utilization -- it just trades one
for another
2) if 'if(a2()' is less likely, i.e. 'b2,c2,d2,f2' are much colder, the new
strategy will reduce reuse.

The key problem is that why 'f1' is not picked as the successor for c1:
 presumably it (f1, which is a merge point) is not picked because there is
more likely predecessor, which is 'd1'. If 'd1' is more likely, 'd1' should
be picked as successor of 'b1' -- picking 'c1' after 'b1' does not make
sense.   If 'd1' is unlikely, then 'f1' should follow 'c1', and 'd1' should
be outlined and placed far away -- which will improve cache reuse and
reduce taken branches.

Unlike triangle shaped pattern if (a()) { b(); } c();, for diamond shape
control flow, when real profile data is available (or branch annotation or
prediction can be trusted), it is almost always better to break topological
constraint using branch probability.

David


On Wed, Mar 4, 2015 at 4:18 AM, Chandler Carruth <chandlerc at gmail.com>
wrote:

> Author: chandlerc
> Date: Wed Mar  4 06:18:08 2015
> New Revision: 231238
>
> URL: http://llvm.org/viewvc/llvm-project?rev=231238&view=rev
> Log:
> [MBP] Fix a really horrible bug in MachineBlockPlacement, but behind
> a flag for now.
>
> First off, thanks to Daniel Jasper for really pointing out the issue
> here. It's been here forever (at least, I think it was there when
> I first wrote this code) without getting really noticed or fixed.
>
> The key problem is what happens when two reasonably common patterns
> happen at the same time: we outline multiple cold regions of code, and
> those regions in turn have diamonds or other CFGs for which we can't
> just topologically lay them out. Consider some C code that looks like:
>
>   if (a1()) { if (b1()) c1(); else d1(); f1(); }
>   if (a2()) { if (b2()) c2(); else d2(); f2(); }
>   done();
>
> Now consider the case where a1() and a2() are unlikely to be true. In
> that case, we might lay out the first part of the function like:
>
>   a1, a2, done;
>
> And then we will be out of successors in which to build the chain. We go
> to find the best block to continue the chain with, which is perfectly
> reasonable here, and find "b1" let's say. Laying out successors gets us
> to:
>
>   a1, a2, done; b1, c1;
>
> At this point, we will refuse to lay out the successor to c1 (f1)
> because there are still un-placed predecessors of f1 and we want to try
> to preserve the CFG structure. So we go get the next best block, d1.
>
> ... wait for it ...
>
> Except that the next best block *isn't* d1. It is b2! d1 is waaay down
> inside these conditionals. It is much less important than b2. Except
> that this is exactly what we didn't want. If we keep going we get the
> entire set of the rest of the CFG *interleaved*!!!
>
>   a1, a2, done; b1, c1; b2, c2; d1, f1; d2, f2;
>
> So we clearly need a better strategy here. =] My current favorite
> strategy is to actually try to place the block whose predecessor is
> closest. This very simply ensures that we unwind these kinds of CFGs the
> way that is natural and fitting, and should minimize the number of cache
> lines instructions are spread across.
>
> It also happens to be *dead simple*. It's like the datastructure was
> specifically set up for this use case or something. We only push blocks
> onto the work list when the last predecessor for them is placed into the
> chain. So the back of the worklist *is* the nearest next block.
>
> Unfortunately, a change like this is going to cause *soooo* many
> benchmarks to swing wildly. So for now I'm adding this under a flag so
> that we and others can validate that this is fixing the problems
> described, that it seems possible to enable, and hopefully that it fixes
> more of our problems long term.
>
> Modified:
>     llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
>     llvm/trunk/test/CodeGen/X86/block-placement.ll
>
> Modified: llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp?rev=231238&r1=231237&r2=231238&view=diff
>
> ==============================================================================
> --- llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp (original)
> +++ llvm/trunk/lib/CodeGen/MachineBlockPlacement.cpp Wed Mar  4 06:18:08
> 2015
> @@ -68,6 +68,13 @@ ExitBlockBias("block-placement-exit-bloc
>                         "over the original exit to be considered the new
> exit."),
>                cl::init(0), cl::Hidden);
>
> +static cl::opt<bool> PlaceLastSuccessor(
> +    "place-last-successor",
> +    cl::desc("When selecting a non-successor block, choose the last block
> to "
> +             "have been a successor. This represents the block whose "
> +             "predecessor was most recently placed."),
> +    cl::init(false), cl::Hidden);
> +
>  static cl::opt<bool> OutlineOptionalBranches(
>      "outline-optional-branches",
>      cl::desc("Put completely optional branches, i.e. branches with a
> common "
> @@ -443,6 +450,25 @@ MachineBasicBlock *MachineBlockPlacement
>  MachineBasicBlock *MachineBlockPlacement::selectBestCandidateBlock(
>      BlockChain &Chain, SmallVectorImpl<MachineBasicBlock *> &WorkList,
>      const BlockFilterSet *BlockFilter) {
> +  if (PlaceLastSuccessor) {
> +    // If we're just placing the last successor as the best candidate, the
> +    // logic is super simple. We skip the already placed entries on the
> +    // worklist and return the most recently added entry that isn't
> placed.
> +    while (!WorkList.empty()) {
> +      MachineBasicBlock *SuccBB = WorkList.pop_back_val();
> +      BlockChain &SuccChain = *BlockToChain.lookup(SuccBB);
> +      if (&SuccChain == &Chain) {
> +        DEBUG(dbgs() << "    " << getBlockName(SuccBB)
> +                     << " -> Already merged!\n");
> +        continue;
> +      }
> +      assert(SuccChain.LoopPredecessors == 0 && "Found CFG-violating
> block");
> +      return SuccBB;
> +    }
> +
> +    return nullptr;
> +  }
> +
>    // Once we need to walk the worklist looking for a candidate, cleanup
> the
>    // worklist of already placed entries.
>    // FIXME: If this shows up on profiles, it could be folded (at the cost
> of
>
> Modified: llvm/trunk/test/CodeGen/X86/block-placement.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/block-placement.ll?rev=231238&r1=231237&r2=231238&view=diff
>
> ==============================================================================
> --- llvm/trunk/test/CodeGen/X86/block-placement.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/block-placement.ll Wed Mar  4 06:18:08 2015
> @@ -1,4 +1,4 @@
> -; RUN: llc -mtriple=i686-linux -pre-RA-sched=source < %s | FileCheck %s
> +; RUN: llc -mtriple=i686-linux -place-last-successor -pre-RA-sched=source
> < %s | FileCheck %s
>
>  declare void @error(i32 %i, i32 %a, i32 %b)
>
> @@ -17,11 +17,11 @@ define i32 @test_ifchains(i32 %i, i32* %
>  ; CHECK: %else4
>  ; CHECK-NOT: .align
>  ; CHECK: %exit
> -; CHECK: %then1
> -; CHECK: %then2
> -; CHECK: %then3
> -; CHECK: %then4
>  ; CHECK: %then5
> +; CHECK: %then4
> +; CHECK: %then3
> +; CHECK: %then2
> +; CHECK: %then1
>
>  entry:
>    %gep1 = getelementptr i32, i32* %a, i32 1
> @@ -82,9 +82,9 @@ define i32 @test_loop_cold_blocks(i32 %i
>  ; CHECK-LABEL: test_loop_cold_blocks:
>  ; CHECK: %entry
>  ; CHECK-NOT: .align
> -; CHECK: %unlikely1
> -; CHECK-NOT: .align
>  ; CHECK: %unlikely2
> +; CHECK-NOT: .align
> +; CHECK: %unlikely1
>  ; CHECK: .align
>  ; CHECK: %body1
>  ; CHECK: %body2
> @@ -135,9 +135,9 @@ define i32 @test_loop_early_exits(i32 %i
>  ; CHECK: %body3
>  ; CHECK: %body4
>  ; CHECK: %exit
> -; CHECK: %bail1
> -; CHECK: %bail2
>  ; CHECK: %bail3
> +; CHECK: %bail2
> +; CHECK: %bail1
>
>  entry:
>    br label %body1
> @@ -1083,3 +1083,87 @@ exit:
>    %ret = phi i32 [ %val1, %then ], [ %val2, %else ]
>    ret i32 %ret
>  }
> +
> +define void @test_outlined() {
> +; This test ends up with diamond control flow in outlined optional
> regions.
> +; These diamonds should still be locally cohensive even when out-of-line
> due to
> +; being cold.
> +; CHECK-LABEL: test_outlined:
> +; CHECK: %a1
> +; CHECK: %a2
> +; CHECK: %done
> +; CHECK: %b2
> +; CHECK: %c2
> +; CHECK: %d2
> +; CHECK: %f2
> +; CHECK: %b1
> +; CHECK: %c1
> +; CHECK: %d1
> +; CHECK: %f1
> +
> +a1:
> +  %call.a1 = call i1 @a1()
> +  br i1 %call.a1, label %b1, label %a2, !prof !0
> +
> +b1:
> +  %call.b1 = call i1 @b1()
> +  br i1 %call.b1, label %c1, label %d1
> +
> +c1:
> +  call void @c1()
> +  br label %f1
> +
> +d1:
> +  call void @d1()
> +  br label %f1
> +
> +f1:
> +  call void @f1()
> +  br label %a2
> +
> +a2:
> +  %call.a2 = call i1 @a2()
> +  br i1 %call.a2, label %b2, label %done, !prof !0
> +
> +b2:
> +  %call.b2 = call i1 @b2()
> +  br i1 %call.b2, label %c2, label %d2
> +
> +c2:
> +  call void @c2()
> +  br label %f2
> +
> +d2:
> +  call void @d2()
> +  br label %f2
> +
> +f2:
> +  call void @f2()
> +  br label %done
> +
> +done:
> +  call void @done()
> +  ret void
> +}
> +
> +declare i1 @a1()
> +
> +declare i1 @b1()
> +
> +declare void @c1()
> +
> +declare void @d1()
> +
> +declare void @f1()
> +
> +declare i1 @a2()
> +
> +declare i1 @b2()
> +
> +declare void @c2()
> +
> +declare void @d2()
> +
> +declare void @f2()
> +
> +declare void @done()
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150304/90f4814c/attachment.html>


More information about the llvm-commits mailing list