[PATCH] D63972: [PowerPC] Do the Early Return for the li and unconditional branch
Zhang Kang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Jul 6 08:11:55 PDT 2019
ZhangKang added a comment.
Herald added subscribers: wuzish, MaskRay.
In D63972#1568955 <https://reviews.llvm.org/D63972#1568955>, @efriedma wrote:
> > The line 29 b .LBB0_10 is created after running the pass branch-folder,
>
> I did a quick test with -print-after-all, and it looks like it's actually created by MachineBlockPlacement?
>
> If we're going to do this transform, we should use the existing TailDup code to do it, not reimplement it in PPCEarlyReturn. Would it make sense to run a re-run the entire tail duplication pass after block placement? Or should we try to do something more targeted?
@efriedma Yes, you are right. This pattern is created by the pass `block-placement`. If I re-run `tail duplication` after the pass `block-placement`, this case will be what we want:
1 .text
2 .abiversion 2
3 .file "HashXMLCh.cpp"
4 .globl _ZN11xercesc_2_79HashXMLCh6equalsEPKvS2_ # -- Begin function _ZN11xercesc_2_79HashXMLCh6equalsEPKvS2_
5 .p2align 4
6 .type _ZN11xercesc_2_79HashXMLCh6equalsEPKvS2_, at function
7 _ZN11xercesc_2_79HashXMLCh6equalsEPKvS2_: # @_ZN11xercesc_2_79HashXMLCh6equalsEPKvS2_
8 .Lfunc_begin0:
9 # %bb.0: # %entry
10 cmpdi 1, 4, 0
11 cmpdi 5, 0
12 cror 20, 6, 2
13 bc 4, 20, .LBB0_6
14 # %bb.1: # %if.then.i
15 bc 12, 6, .LBB0_3
16 # %bb.2: # %land.lhs.true.i
17 lhz 4, 0(4)
18 li 3, 0
19 cmplwi 1, 4, 0
20 bnelr 1
21 .LBB0_3: # %lor.lhs.false3.i
22 bc 12, 2, .LBB0_5
23 # %bb.4: # %land.lhs.true5.i
24 lhz 4, 0(5)
25 li 3, 0
26 cmplwi 4, 0
27 bnelr 0
28 .LBB0_5: # %if.else.i
29 li 3, 1
30 blr
31 .LBB0_6: # %while.cond.preheader.i
32 lhz 8, 0(4)
33 lhz 6, 0(5)
34 li 3, 0
35 cmplw 8, 6
36 bnelr 0
37 # %bb.7: # %while.body.i.preheader
38 addi 6, 5, 2
39 addi 7, 4, 2
40 .p2align 4
41 .LBB0_8: # %while.body.i
42 # =>This Inner Loop Header: Depth=1
43 andi. 8, 8, 65535
44 beq 0, .LBB0_5
45 # %bb.9: # %if.end12.i
46 # in Loop: Header=BB0_8 Depth=1
47 addi 5, 5, 2
48 addi 4, 4, 2
49 lhz 8, 0(4)
50 lhz 9, 0(5)
51 addi 6, 6, 2
52 addi 7, 7, 2
53 cmplw 8, 9
54 beq 0, .LBB0_8
55 blr
For the good assembly, the `.LBB0_10` has been removed, and `.LBB0_5` is reserved. All instruction which branch to `.LBB0_10` will branch to `.LBB0_5`.
-------------------------------------------------------------------------
Below is to analyze the reason for the bad pattern.
Before run the pass `block-placement`, the mir is below:
asm
body: |
bb.0.entry:
successors: %bb.5(0x40000000), %bb.1(0x40000000)
liveins: $x4, $x5
renamable $cr1 = CMPDI renamable $x4, 0
renamable $cr0 = CMPDI renamable $x5, 0
renamable $cr5lt = CROR renamable $cr1eq, renamable $cr0eq
BC killed renamable $cr5lt, %bb.5
bb.1.while.cond.preheader.i:
successors: %bb.2(0x40000000), %bb.11(0x40000000)
liveins: $x4, $x5
renamable $r8 = LHZ 0, renamable $x4 :: (load 2 from %ir.1, !tbaa !2)
renamable $r6 = LHZ 0, renamable $x5 :: (load 2 from %ir.0, !tbaa !2)
renamable $x3 = LI8 0
renamable $cr0 = CMPLW renamable $r8, killed renamable $r6
BCC 68, killed renamable $cr0, %bb.11
bb.2.while.body.i.preheader:
successors: %bb.3(0x80000000)
liveins: $r8, $x3, $x4, $x5
renamable $x6 = ADDI8 renamable $x5, 2
renamable $x7 = ADDI8 renamable $x4, 2
bb.3.while.body.i:
successors: %bb.4(0x04000000), %bb.10(0x7c000000)
liveins: $r8, $x3, $x4, $x5, $x6, $x7
dead renamable $r8 = ANDIo killed renamable $r8, 65535, implicit-def $cr0
BCC 68, killed renamable $cr0, %bb.10
bb.4:
renamable $x3 = LI8 1
BLR8 implicit $lr8, implicit $rm, implicit killed $x3
bb.5.if.then.i:
successors: %bb.7(0x30000000), %bb.6(0x50000000)
liveins: $cr0, $cr1, $x4, $x5
BC killed renamable $cr1eq, %bb.7
bb.6.land.lhs.true.i:
successors: %bb.7(0x30000000), %bb.11(0x50000000)
liveins: $cr0, $x4, $x5
renamable $r4 = LHZ 0, killed renamable $x4 :: (load 2 from %ir.4, !tbaa !2)
renamable $x3 = LI8 0
renamable $cr1 = CMPLWI killed renamable $r4, 0
BCC 68, killed renamable $cr1, %bb.11
bb.7.lor.lhs.false3.i:
successors: %bb.9(0x30000000), %bb.8(0x50000000)
liveins: $cr0, $x5
BC killed renamable $cr0eq, %bb.9
bb.8.land.lhs.true5.i:
successors: %bb.9(0x80000000)
liveins: $x5
renamable $r4 = LHZ 0, killed renamable $x5 :: (load 2 from %ir.6, !tbaa !2)
renamable $x3 = LI8 0
renamable $cr0 = CMPLWI killed renamable $r4, 0
BCCLR 68, killed renamable $cr0, implicit $lr, implicit $rm, implicit killed $x3
bb.9.if.else.i:
renamable $x3 = LI8 1
BLR8 implicit $lr8, implicit $rm, implicit killed $x3
bb.10.if.end12.i:
successors: %bb.3(0x7c000000), %bb.11(0x04000000)
liveins: $x3, $x4, $x5, $x6, $x7
renamable $x5 = ADDI8 killed renamable $x5, 2
renamable $x4 = ADDI8 killed renamable $x4, 2
renamable $r8 = LHZ 0, renamable $x4 :: (load 2 from %ir.14, !tbaa !2)
renamable $r9 = LHZ 0, renamable $x5 :: (load 2 from %ir.12, !tbaa !2)
renamable $x6 = ADDI8 killed renamable $x6, 2
renamable $x7 = ADDI8 killed renamable $x7, 2
renamable $cr0 = CMPLW renamable $r8, killed renamable $r9
BCC 76, killed renamable $cr0, %bb.3
bb.11._ZN11xercesc_2_79XMLString6equalsEPKtS2_.exit:
liveins: $x3
BLR8 implicit $lr8, implicit $rm, implicit killed $x3
The `bb.4` and `bb.9` are same.
The pass:`block-placement` will merge the same BasicBlock, so after running pass `block-placement`, the `bb.9` will branch to the same BasicBlock bb.4, it's like below:
bb.9.if.else.i:
; predecessors: %bb.7, %bb.8
successors: %bb.4(0x80000000); %bb.4(100.00%)
B %bb.4
In the file `llvm/lib/CodeGen/BranchFolding.cpp`:
C++
687 // If both blocks are identical and end in a branch, merge them unless they
688 // both have a fallthrough predecessor and successor.
689 // We can only do this after block placement because it depends on whether
690 // there are fallthroughs, and we don't know until after layout.
691 if (AfterPlacement && I1 == MBB1->begin() && I2 == MBB2->begin()) {
692 auto BothFallThrough = [](MachineBasicBlock *MBB) {
693 if (MBB->succ_size() != 0 && !MBB->canFallThrough())
694 return false;
695 MachineFunction::iterator I(MBB);
696 MachineFunction *MF = MBB->getParent();
697 return (MBB != &*MF->begin()) && std::prev(I)->canFallThrough();
698 };
699 if (!BothFallThrough(MBB1) || !BothFallThrough(MBB2))
700 return true;
701 }
702
703 // If both blocks have an unconditional branch temporarily stripped out,
704 // count that as an additional common instruction for the following
705 // heuristics. This heuristic is only accurate for single-succ blocks, so to
706 // make sure that during layout merging and duplicating don't crash, we check
707 // for that when merging during layout.
708 unsigned EffectiveTailLen = CommonTailLen;
709 if (SuccBB && MBB1 != PredBB && MBB2 != PredBB &&
710 (MBB1->succ_size() == 1 || !AfterPlacement) &&
711 !MBB1->back().isBarrier() &&
712 !MBB2->back().isBarrier())
713 ++EffectiveTailLen;
714
715 // Check if the common tail is long enough to be worthwhile.
716 if (EffectiveTailLen >= MinCommonTailLength)
717 return true;
The `bb.9` will be optimized to `B %bb.4`, because the line `700 return true`. Maybe we can add more limitaion like below:
699 if ((!BothFallThrough(MBB1) || !BothFallThrough(MBB2))) {
700 if (!MBB1->back().isReturn() ||
701 !MBB2->back().isReturn() ||
702 CommonTailLen >= MinCommonTailLength)
703 return true;
704 }
When `MBB1` is same as `MBB2`, both are end with return instruction and `CommonTailLen < MinCommonTailLength`, we won't return true. The default value of `MinCommonTailLength` is 3.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D63972/new/
https://reviews.llvm.org/D63972
More information about the llvm-commits
mailing list