[llvm] r284757 - Using branch probability to guide critical edge splitting.

Keith Walker via llvm-commits llvm-commits at lists.llvm.org
Sun Oct 23 00:15:06 PDT 2016


Dehao,

Just to let you know that unfortunately this change has had a severe impact
on a third party embedded benchmarking suite running on AArch64, ARM and Thumb
targets with regressions in multiple benchmarks ranging from about 10% to 48%

I will try to get additional details for you when I have them available.

Keith

> -----Original Message-----
> From: llvm-commits [mailto:llvm-commits-bounces at lists.llvm.org] On
> Behalf Of Dehao Chen via llvm-commits
> Sent: 20 October 2016 19:07
> To: llvm-commits at lists.llvm.org
> Subject: [llvm] r284757 - Using branch probability to guide critical
> edge splitting.
> 
> Author: dehao
> Date: Thu Oct 20 13:06:52 2016
> New Revision: 284757
> 
> URL: http://llvm.org/viewvc/llvm-project?rev=284757&view=rev
> Log:
> Using branch probability to guide critical edge splitting.
> 
> Summary:
> The original heuristic to break critical edge during machine sink is
> relatively conservertive: when there is only one instruction sinkable to
> the critical edge, it is likely that the machine sink pass will not
> break the critical edge. This leads to many speculative instructions
> executed at runtime. However, with profile info, we could model the
> splitting benefits: if the critical edge has 50% taken rate, it would
> always be beneficial to split the critical edge to avoid the speculated
> runtime instructions. This patch uses profile to guide critical edge
> splitting in machine sink pass.
> 
> The performance impact on speccpu2006 on Intel sandybridge machines:
> 
> spec/2006/fp/C++/444.namd                  25.3  +0.26%
> spec/2006/fp/C++/447.dealII               45.96  -0.10%
> spec/2006/fp/C++/450.soplex               41.97  +1.49%
> spec/2006/fp/C++/453.povray               36.83  -0.96%
> spec/2006/fp/C/433.milc                   23.81  +0.32%
> spec/2006/fp/C/470.lbm                    41.17  +0.34%
> spec/2006/fp/C/482.sphinx3                48.13  +0.69%
> spec/2006/int/C++/471.omnetpp             22.45  +3.25%
> spec/2006/int/C++/473.astar               21.35  -2.06%
> spec/2006/int/C++/483.xalancbmk           36.02  -2.39%
> spec/2006/int/C/400.perlbench              33.7  -0.17%
> spec/2006/int/C/401.bzip2                  22.9  +0.52%
> spec/2006/int/C/403.gcc                   32.42  -0.54%
> spec/2006/int/C/429.mcf                   39.59  +0.19%
> spec/2006/int/C/445.gobmk                 26.98  -0.00%
> spec/2006/int/C/456.hmmer                 24.52  -0.18%
> spec/2006/int/C/458.sjeng                 28.26  +0.02%
> spec/2006/int/C/462.libquantum            55.44  +3.74%
> spec/2006/int/C/464.h264ref               46.67  -0.39%
> 
> geometric mean                                   +0.20%
> 
> Manually checked 473 and 471 to verify the diff is in the noise range.
> 
> Reviewers: rengolin, davidxl
> 
> Subscribers: llvm-commits
> 
> Differential Revision: https://reviews.llvm.org/D24818
> 
> Added:
>     llvm/trunk/test/CodeGen/X86/machine-sink.ll
> Modified:
>     llvm/trunk/lib/CodeGen/MachineSink.cpp
>     llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll
>     llvm/trunk/test/CodeGen/ARM/code-placement.ll
>     llvm/trunk/test/CodeGen/X86/block-placement.ll
>     llvm/trunk/test/CodeGen/X86/clz.ll
>     llvm/trunk/test/CodeGen/X86/loop-search.ll
>     llvm/trunk/test/CodeGen/X86/phys_subreg_coalesce-2.ll
>     llvm/trunk/test/CodeGen/X86/pr2659.ll
>     llvm/trunk/test/DebugInfo/COFF/pieces.ll
> 
> Modified: llvm/trunk/lib/CodeGen/MachineSink.cpp
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/lib/CodeGen/MachineSink.cpp?rev=284757&r1=284756&r2=2
> 84757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/lib/CodeGen/MachineSink.cpp (original)
> +++ llvm/trunk/lib/CodeGen/MachineSink.cpp Thu Oct 20 13:06:52 2016
> @@ -24,6 +24,7 @@
>  #include "llvm/Analysis/AliasAnalysis.h"
>  #include "llvm/CodeGen/MachineBasicBlock.h"
>  #include "llvm/CodeGen/MachineBlockFrequencyInfo.h"
> +#include "llvm/CodeGen/MachineBranchProbabilityInfo.h"
>  #include "llvm/CodeGen/MachineDominators.h"
>  #include "llvm/CodeGen/MachineFunction.h"
>  #include "llvm/CodeGen/MachineFunctionPass.h"
> @@ -60,6 +61,15 @@ UseBlockFreqInfo("machine-sink-bfi",
>             cl::desc("Use block frequency info to find successors to
> sink"),
>             cl::init(true), cl::Hidden);
> 
> +static cl::opt<unsigned> SplitEdgeProbabilityThreshold(
> +    "machine-sink-split-probability-threshold",
> +    cl::desc(
> +        "Percentage threshold for splitting single-instruction critical
> edge. "
> +        "If the branch threshold is higher than this threshold, we
> allow "
> +        "speculative execution of up to 1 instruction to avoid
> branching to "
> +        "splitted critical edge"),
> +    cl::init(40), cl::Hidden);
> +
>  STATISTIC(NumSunk,      "Number of machine instructions sunk");
>  STATISTIC(NumSplit,     "Number of critical edges split");
>  STATISTIC(NumCoalesces, "Number of copies coalesced");
> @@ -74,6 +84,7 @@ namespace {
>      MachinePostDominatorTree *PDT; // Machine post dominator tree
>      MachineLoopInfo *LI;
>      const MachineBlockFrequencyInfo *MBFI;
> +    const MachineBranchProbabilityInfo *MBPI;
>      AliasAnalysis *AA;
> 
>      // Remember which edges have been considered for breaking.
> @@ -105,6 +116,7 @@ namespace {
>        AU.addRequired<MachineDominatorTree>();
>        AU.addRequired<MachinePostDominatorTree>();
>        AU.addRequired<MachineLoopInfo>();
> +      AU.addRequired<MachineBranchProbabilityInfo>();
>        AU.addPreserved<MachineDominatorTree>();
>        AU.addPreserved<MachinePostDominatorTree>();
>        AU.addPreserved<MachineLoopInfo>();
> @@ -163,6 +175,7 @@ char MachineSinking::ID = 0;
>  char &llvm::MachineSinkingID = MachineSinking::ID;
>  INITIALIZE_PASS_BEGIN(MachineSinking, "machine-sink",
>                  "Machine code sinking", false, false)
> +INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo)
>  INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
>  INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo)
>  INITIALIZE_PASS_DEPENDENCY(AAResultsWrapperPass)
> @@ -283,6 +296,7 @@ bool MachineSinking::runOnMachineFunctio
>    PDT = &getAnalysis<MachinePostDominatorTree>();
>    LI = &getAnalysis<MachineLoopInfo>();
>    MBFI = UseBlockFreqInfo ? &getAnalysis<MachineBlockFrequencyInfo>() :
> nullptr;
> +  MBPI = &getAnalysis<MachineBranchProbabilityInfo>();
>    AA = &getAnalysis<AAResultsWrapperPass>().getAAResults();
> 
>    bool EverMadeChange = false;
> @@ -383,6 +397,10 @@ bool MachineSinking::isWorthBreakingCrit
>    if (!MI.isCopy() && !TII->isAsCheapAsAMove(MI))
>      return true;
> 
> +  if (From->isSuccessor(To) && MBPI->getEdgeProbability(From, To) <=
> +      BranchProbability(SplitEdgeProbabilityThreshold, 100))
> +    return true;
> +
>    // MI is cheap, we probably don't want to break the critical edge for
> it.
>    // However, if this would allow some definitions of its source
> operands
>    // to be sunk then it's probably worth it.
> 
> Modified: llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/ARM/atomic-
> cmpxchg.ll?rev=284757&r1=284756&r2=284757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/atomic-cmpxchg.ll Thu Oct 20 13:06:52
> 2016
> @@ -38,16 +38,14 @@ entry:
>  ; CHECK-ARMV6-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
>  ; CHECK-ARMV6-NEXT: [[TRY:.LBB[0-9_]+]]:
>  ; CHECK-ARMV6-NEXT: ldrexb [[LD:r[0-9]+]], [r0]
> -; CHECK-ARMV6-NEXT: mov [[RES:r[0-9]+]], #0
>  ; CHECK-ARMV6-NEXT: cmp [[LD]], [[DESIRED]]
> -; CHECK-ARMV6-NEXT: bne [[END:.LBB[0-9_]+]]
> +; CHECK-ARMV6-NEXT: movne [[RES:r[0-9]+]], #0
> +; CHECK-ARMV6-NEXT: bxne lr
>  ; CHECK-ARMV6-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]
> -; CHECK-ARMV6-NEXT: mov [[RES]], #1
>  ; CHECK-ARMV6-NEXT: cmp [[SUCCESS]], #0
> -; CHECK-ARMV6-NEXT: bne [[TRY]]
> -; CHECK-ARMV6-NEXT: [[END]]:
> -; CHECK-ARMV6-NEXT: mov r0, [[RES]]
> -; CHECK-ARMV6-NEXT: bx lr
> +; CHECK-ARMV6-NEXT: moveq [[RES]], #1
> +; CHECK-ARMV6-NEXT: bxeq lr
> +; CHECK-ARMV6-NEXT: b [[TRY]]
> 
>  ; CHECK-THUMBV6-LABEL: test_cmpxchg_res_i8:
>  ; CHECK-THUMBV6:       mov [[EXPECTED:r[0-9]+]], r1
> @@ -64,20 +62,18 @@ entry:
>  ; CHECK-ARMV7-LABEL: test_cmpxchg_res_i8:
>  ; CHECK-ARMV7-NEXT: .fnstart
>  ; CHECK-ARMV7-NEXT: uxtb [[DESIRED:r[0-9]+]], r1
> -; CHECK-ARMV7-NEXT: [[TRY:.LBB[0-9_]+]]:
> -; CHECK-ARMV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]
> -; CHECK-ARMV7-NEXT: cmp [[LD]], [[DESIRED]]
> -; CHECK-ARMV7-NEXT: bne [[FAIL:.LBB[0-9_]+]]
> +; CHECK-ARMV7-NEXT: b [[TRY:.LBB[0-9_]+]]
> +; CHECK-ARMV7-NEXT: [[HEAD:.LBB[0-9_]+]]:
>  ; CHECK-ARMV7-NEXT: strexb [[SUCCESS:r[0-9]+]], r2, [r0]
> -; CHECK-ARMV7-NEXT: mov [[RES:r[0-9]+]], #1
>  ; CHECK-ARMV7-NEXT: cmp [[SUCCESS]], #0
> -; CHECK-ARMV7-NEXT: bne [[TRY]]
> -; CHECK-ARMV7-NEXT: b [[END:.LBB[0-9_]+]]
> -; CHECK-ARMV7-NEXT: [[FAIL]]:
> +; CHECK-ARMV7-NEXT: moveq [[RES:r[0-9]+]], #1
> +; CHECK-ARMV7-NEXT: bxeq lr
> +; CHECK-ARMV7-NEXT: [[TRY]]:
> +; CHECK-ARMV7-NEXT: ldrexb [[LD:r[0-9]+]], [r0]
> +; CHECK-ARMV7-NEXT: cmp [[LD]], [[DESIRED]]
> +; CHECK-ARMV7-NEXT: beq [[HEAD]]
>  ; CHECK-ARMV7-NEXT: clrex
>  ; CHECK-ARMV7-NEXT: mov [[RES]], #0
> -; CHECK-ARMV7-NEXT: [[END]]:
> -; CHECK-ARMV7-NEXT: mov r0, [[RES]]
>  ; CHECK-ARMV7-NEXT: bx lr
> 
>  ; CHECK-THUMBV7-LABEL: test_cmpxchg_res_i8:
> 
> Modified: llvm/trunk/test/CodeGen/ARM/code-placement.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/ARM/code-
> placement.ll?rev=284757&r1=284756&r2=284757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/ARM/code-placement.ll (original)
> +++ llvm/trunk/test/CodeGen/ARM/code-placement.ll Thu Oct 20 13:06:52
> 2016
> @@ -12,9 +12,9 @@ entry:
>    br i1 %0, label %bb2, label %bb
> 
>  bb:
> -; CHECK: LBB0_2:
> -; CHECK: bne LBB0_2
> -; CHECK-NOT: b LBB0_2
> +; CHECK: LBB0_[[LABEL:[0-9]]]:
> +; CHECK: bne LBB0_[[LABEL]]
> +; CHECK-NOT: b LBB0_[[LABEL]]
>  ; CHECK: bx lr
>    %list_addr.05 = phi %struct.list_head* [ %2, %bb ], [ %list, %entry ]
>    %next.04 = phi %struct.list_head* [ %list_addr.05, %bb ], [ null,
> %entry ]
> @@ -34,14 +34,13 @@ bb2:
>  define i32 @t2(i32 %passes, i32* nocapture %src, i32 %size) nounwind
> readonly {
>  entry:
>  ; CHECK-LABEL: t2:
> -; CHECK: beq LBB1_[[RET:.]]
>    %0 = icmp eq i32 %passes, 0                     ; <i1> [#uses=1]
>    br i1 %0, label %bb5, label %bb.nph15
> 
> -; CHECK: LBB1_[[PREHDR:.]]: @ %bb2.preheader
>  bb1:                                              ; preds =
> %bb2.preheader, %bb1
> -; CHECK: LBB1_[[BB1:.]]: @ %bb1
> -; CHECK: bne LBB1_[[BB1]]
> +; CHECK: LBB1_[[BB3:.]]: @ %bb3
> +; CHECK: LBB1_[[PREHDR:.]]: @ %bb2.preheader
> +; CHECK: blt LBB1_[[BB3]]
>    %indvar = phi i32 [ %indvar.next, %bb1 ], [ 0, %bb2.preheader ] ;
> <i32> [#uses=2]
>    %sum.08 = phi i32 [ %2, %bb1 ], [ %sum.110, %bb2.preheader ] ; <i32>
> [#uses=1]
>    %tmp17 = sub i32 %i.07, %indvar                 ; <i32> [#uses=1]
> @@ -53,9 +52,9 @@ bb1:
>    br i1 %exitcond, label %bb3, label %bb1
> 
>  bb3:                                              ; preds = %bb1,
> %bb2.preheader
> -; CHECK: LBB1_[[BB3:.]]: @ %bb3
> -; CHECK: bne LBB1_[[PREHDR]]
> -; CHECK-NOT: b LBB1_
> +; CHECK: LBB1_[[BB1:.]]: @ %bb1
> +; CHECK: bne LBB1_[[BB1]]
> +; CHECK: b LBB1_[[BB3]]
>    %sum.0.lcssa = phi i32 [ %sum.110, %bb2.preheader ], [ %2, %bb1 ] ;
> <i32> [#uses=2]
>    %3 = add i32 %pass.011, 1                       ; <i32> [#uses=2]
>    %exitcond18 = icmp eq i32 %3, %passes           ; <i1> [#uses=1]
> @@ -71,8 +70,6 @@ bb2.preheader:
>    %sum.110 = phi i32 [ 0, %bb.nph15 ], [ %sum.0.lcssa, %bb3 ] ; <i32>
> [#uses=2]
>    br i1 %4, label %bb1, label %bb3
> 
> -; CHECK: LBB1_[[RET]]: @ %bb5
> -; CHECK: pop
>  bb5:                                              ; preds = %bb3,
> %entry
>    %sum.1.lcssa = phi i32 [ 0, %entry ], [ %sum.0.lcssa, %bb3 ] ; <i32>
> [#uses=1]
>    ret i32 %sum.1.lcssa
> 
> Modified: llvm/trunk/test/CodeGen/X86/block-placement.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/block-
> placement.ll?rev=284757&r1=284756&r2=284757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/X86/block-placement.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/block-placement.ll Thu Oct 20 13:06:52
> 2016
> @@ -478,12 +478,12 @@ define void @fpcmp_unanalyzable_branch(i
>  ; CHECK-LABEL: fpcmp_unanalyzable_branch:
>  ; CHECK:       # BB#0: # %entry
>  ; CHECK:       # BB#1: # %entry.if.then_crit_edge
> -; CHECK:       .LBB10_4: # %if.then
> -; CHECK:       .LBB10_5: # %if.end
> +; CHECK:       .LBB10_5: # %if.then
> +; CHECK:       .LBB10_6: # %if.end
>  ; CHECK:       # BB#3: # %exit
>  ; CHECK:       jne .LBB10_4
> -; CHECK-NEXT:  jnp .LBB10_5
> -; CHECK-NEXT:  jmp .LBB10_4
> +; CHECK-NEXT:  jnp .LBB10_6
> +; CHECK:       jmp .LBB10_5
> 
>  entry:
>  ; Note that this branch must be strongly biased toward
> 
> Modified: llvm/trunk/test/CodeGen/X86/clz.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/clz.ll?rev=284757&r1=284756&r2=28475
> 7&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/X86/clz.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/clz.ll Thu Oct 20 13:06:52 2016
> @@ -279,28 +279,32 @@ define i64 @ctlz_i64(i64 %x) {
>  define i8 @ctlz_i8_zero_test(i8 %n) {
>  ; X32-LABEL: ctlz_i8_zero_test:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movb {{[0-9]+}}(%esp), %cl
> -; X32-NEXT:    movb $8, %al
> -; X32-NEXT:    testb %cl, %cl
> -; X32-NEXT:    je .LBB8_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    movzbl %cl, %eax
> +; X32-NEXT:    movb {{[0-9]+}}(%esp), %al
> +; X32-NEXT:    testb %al, %al
> +; X32-NEXT:    je .LBB8_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    movzbl %al, %eax
>  ; X32-NEXT:    bsrl %eax, %eax
>  ; X32-NEXT:    xorl $7, %eax
> -; X32-NEXT:  .LBB8_2: # %cond.end
> +; X32-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB8_1:
> +; X32-NEXT:    movb    $8, %al
>  ; X32-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: ctlz_i8_zero_test:
>  ; X64:       # BB#0:
> -; X64-NEXT:    movb $8, %al
>  ; X64-NEXT:    testb %dil, %dil
> -; X64-NEXT:    je .LBB8_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB8_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    movzbl %dil, %eax
>  ; X64-NEXT:    bsrl %eax, %eax
>  ; X64-NEXT:    xorl $7, %eax
> -; X64-NEXT:  .LBB8_2: # %cond.end
> +; X64-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB8_1:
> +; X64-NEXT:    movb    $8, %al
>  ; X64-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
>  ; X64-NEXT:    retq
>  ;
> @@ -327,26 +331,30 @@ define i8 @ctlz_i8_zero_test(i8 %n) {
>  define i16 @ctlz_i16_zero_test(i16 %n) {
>  ; X32-LABEL: ctlz_i16_zero_test:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
> -; X32-NEXT:    movw $16, %ax
> -; X32-NEXT:    testw %cx, %cx
> -; X32-NEXT:    je .LBB9_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    bsrw %cx, %ax
> +; X32-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
> +; X32-NEXT:    testw %ax, %ax
> +; X32-NEXT:    je .LBB9_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    bsrw %ax, %ax
>  ; X32-NEXT:    xorl $15, %eax
> -; X32-NEXT:  .LBB9_2: # %cond.end
> +; X32-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB9_1:
> +; X32-NEXT:    movw    $16, %ax
>  ; X32-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: ctlz_i16_zero_test:
>  ; X64:       # BB#0:
> -; X64-NEXT:    movw $16, %ax
>  ; X64-NEXT:    testw %di, %di
> -; X64-NEXT:    je .LBB9_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB9_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    bsrw %di, %ax
>  ; X64-NEXT:    xorl $15, %eax
> -; X64-NEXT:  .LBB9_2: # %cond.end
> +; X64-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB9_1:
> +; X64-NEXT:    movw $16, %ax
>  ; X64-NEXT:    # kill: %AX<def> %AX<kill> %EAX<kill>
>  ; X64-NEXT:    retq
>  ;
> @@ -367,25 +375,27 @@ define i16 @ctlz_i16_zero_test(i16 %n) {
>  define i32 @ctlz_i32_zero_test(i32 %n) {
>  ; X32-LABEL: ctlz_i32_zero_test:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
> -; X32-NEXT:    movl $32, %eax
> -; X32-NEXT:    testl %ecx, %ecx
> -; X32-NEXT:    je .LBB10_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    bsrl %ecx, %eax
> +; X32-NEXT:    movl {{[0-9]+}}(%esp), %eax
> +; X32-NEXT:    testl %eax, %eax
> +; X32-NEXT:    je .LBB10_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    bsrl %eax, %eax
>  ; X32-NEXT:    xorl $31, %eax
> -; X32-NEXT:  .LBB10_2: # %cond.end
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB10_1:
> +; X32-NEXT:    movl    $32, %eax
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: ctlz_i32_zero_test:
>  ; X64:       # BB#0:
> -; X64-NEXT:    movl $32, %eax
>  ; X64-NEXT:    testl %edi, %edi
> -; X64-NEXT:    je .LBB10_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB10_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    bsrl %edi, %eax
>  ; X64-NEXT:    xorl $31, %eax
> -; X64-NEXT:  .LBB10_2: # %cond.end
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB10_1:
> +; X64-NEXT:    movl $32, %eax
>  ; X64-NEXT:    retq
>  ;
>  ; X32-CLZ-LABEL: ctlz_i32_zero_test:
> @@ -464,26 +474,30 @@ define i64 @ctlz_i64_zero_test(i64 %n) {
>  define i8 @cttz_i8_zero_test(i8 %n) {
>  ; X32-LABEL: cttz_i8_zero_test:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movb {{[0-9]+}}(%esp), %cl
> -; X32-NEXT:    movb $8, %al
> -; X32-NEXT:    testb %cl, %cl
> -; X32-NEXT:    je .LBB12_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    movzbl %cl, %eax
> +; X32-NEXT:    movb {{[0-9]+}}(%esp), %al
> +; X32-NEXT:    testb %al, %al
> +; X32-NEXT:    je .LBB12_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    movzbl %al, %eax
>  ; X32-NEXT:    bsfl %eax, %eax
> -; X32-NEXT:  .LBB12_2: # %cond.end
> +; X32-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB12_1
> +; X32-NEXT:    movb $8, %al
>  ; X32-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: cttz_i8_zero_test:
>  ; X64:       # BB#0:
> -; X64-NEXT:    movb $8, %al
>  ; X64-NEXT:    testb %dil, %dil
> -; X64-NEXT:    je .LBB12_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB12_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    movzbl %dil, %eax
>  ; X64-NEXT:    bsfl %eax, %eax
> -; X64-NEXT:  .LBB12_2: # %cond.end
> +; X64-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB12_1:
> +; X64-NEXT:    movb $8, %al
>  ; X64-NEXT:    # kill: %AL<def> %AL<kill> %EAX<kill>
>  ; X64-NEXT:    retq
>  ;
> @@ -510,23 +524,25 @@ define i8 @cttz_i8_zero_test(i8 %n) {
>  define i16 @cttz_i16_zero_test(i16 %n) {
>  ; X32-LABEL: cttz_i16_zero_test:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movzwl {{[0-9]+}}(%esp), %ecx
> +; X32-NEXT:    movzwl {{[0-9]+}}(%esp), %eax
> +; X32-NEXT:    testw %ax, %ax
> +; X32-NEXT:    je .LBB13_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    bsfw %ax, %ax
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB13_1
>  ; X32-NEXT:    movw $16, %ax
> -; X32-NEXT:    testw %cx, %cx
> -; X32-NEXT:    je .LBB13_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    bsfw %cx, %ax
> -; X32-NEXT:  .LBB13_2: # %cond.end
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: cttz_i16_zero_test:
>  ; X64:       # BB#0:
> -; X64-NEXT:    movw $16, %ax
>  ; X64-NEXT:    testw %di, %di
> -; X64-NEXT:    je .LBB13_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB13_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    bsfw %di, %ax
> -; X64-NEXT:  .LBB13_2: # %cond.end
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB13_1:
> +; X64-NEXT:    movw $16, %ax
>  ; X64-NEXT:    retq
>  ;
>  ; X32-CLZ-LABEL: cttz_i16_zero_test:
> @@ -546,23 +562,25 @@ define i16 @cttz_i16_zero_test(i16 %n) {
>  define i32 @cttz_i32_zero_test(i32 %n) {
>  ; X32-LABEL: cttz_i32_zero_test:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
> +; X32-NEXT:    movl {{[0-9]+}}(%esp), %eax
> +; X32-NEXT:    testl %eax, %eax
> +; X32-NEXT:    je .LBB14_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    bsfl %eax, %eax
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB14_1
>  ; X32-NEXT:    movl $32, %eax
> -; X32-NEXT:    testl %ecx, %ecx
> -; X32-NEXT:    je .LBB14_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    bsfl %ecx, %eax
> -; X32-NEXT:  .LBB14_2: # %cond.end
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: cttz_i32_zero_test:
>  ; X64:       # BB#0:
> -; X64-NEXT:    movl $32, %eax
>  ; X64-NEXT:    testl %edi, %edi
> -; X64-NEXT:    je .LBB14_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB14_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    bsfl %edi, %eax
> -; X64-NEXT:  .LBB14_2: # %cond.end
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB14_1:
> +; X64-NEXT:    movl $32, %eax
>  ; X64-NEXT:    retq
>  ;
>  ; X32-CLZ-LABEL: cttz_i32_zero_test:
> @@ -642,25 +660,27 @@ define i64 @cttz_i64_zero_test(i64 %n) {
>  define i32 @ctlz_i32_fold_cmov(i32 %n) {
>  ; X32-LABEL: ctlz_i32_fold_cmov:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
> -; X32-NEXT:    orl $1, %ecx
> -; X32-NEXT:    movl $32, %eax
> -; X32-NEXT:    je .LBB16_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    bsrl %ecx, %eax
> +; X32-NEXT:    movl {{[0-9]+}}(%esp), %eax
> +; X32-NEXT:    orl $1, %eax
> +; X32-NEXT:    je .LBB16_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    bsrl %eax, %eax
>  ; X32-NEXT:    xorl $31, %eax
> -; X32-NEXT:  .LBB16_2: # %cond.end
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB16_1
> +; X32-NEXT:    movl $32, %eax
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: ctlz_i32_fold_cmov:
>  ; X64:       # BB#0:
>  ; X64-NEXT:    orl $1, %edi
> -; X64-NEXT:    movl $32, %eax
> -; X64-NEXT:    je .LBB16_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB16_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    bsrl %edi, %eax
>  ; X64-NEXT:    xorl $31, %eax
> -; X64-NEXT:  .LBB16_2: # %cond.end
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB16_1:
> +; X64-NEXT:    movl $32, %eax
>  ; X64-NEXT:    retq
>  ;
>  ; X32-CLZ-LABEL: ctlz_i32_fold_cmov:
> @@ -716,26 +736,30 @@ define i32 @ctlz_bsr(i32 %n) {
>  define i32 @ctlz_bsr_zero_test(i32 %n) {
>  ; X32-LABEL: ctlz_bsr_zero_test:
>  ; X32:       # BB#0:
> -; X32-NEXT:    movl {{[0-9]+}}(%esp), %ecx
> -; X32-NEXT:    movl $32, %eax
> -; X32-NEXT:    testl %ecx, %ecx
> -; X32-NEXT:    je .LBB18_2
> -; X32-NEXT:  # BB#1: # %cond.false
> -; X32-NEXT:    bsrl %ecx, %eax
> +; X32-NEXT:    movl {{[0-9]+}}(%esp), %eax
> +; X32-NEXT:    testl %eax, %eax
> +; X32-NEXT:    je .LBB18_1
> +; X32-NEXT:  # BB#2: # %cond.false
> +; X32-NEXT:    bsrl %eax, %eax
>  ; X32-NEXT:    xorl $31, %eax
> -; X32-NEXT:  .LBB18_2: # %cond.end
> +; X32-NEXT:    xorl $31, %eax
> +; X32-NEXT:    retl
> +; X32-NEXT:  .LBB18_1:
> +; X32-NEXT:    movl $32, %eax
>  ; X32-NEXT:    xorl $31, %eax
>  ; X32-NEXT:    retl
>  ;
>  ; X64-LABEL: ctlz_bsr_zero_test:
>  ; X64:       # BB#0:
> -; X64-NEXT:    movl $32, %eax
>  ; X64-NEXT:    testl %edi, %edi
> -; X64-NEXT:    je .LBB18_2
> -; X64-NEXT:  # BB#1: # %cond.false
> +; X64-NEXT:    je .LBB18_1
> +; X64-NEXT:  # BB#2: # %cond.false
>  ; X64-NEXT:    bsrl %edi, %eax
>  ; X64-NEXT:    xorl $31, %eax
> -; X64-NEXT:  .LBB18_2: # %cond.end
> +; X64-NEXT:    xorl $31, %eax
> +; X64-NEXT:    retq
> +; X64-NEXT:  .LBB18_1:
> +; X64-NEXT:    movl $32, %eax
>  ; X64-NEXT:    xorl $31, %eax
>  ; X64-NEXT:    retq
>  ;
> 
> Modified: llvm/trunk/test/CodeGen/X86/loop-search.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/loop-
> search.ll?rev=284757&r1=284756&r2=284757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/X86/loop-search.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/loop-search.ll Thu Oct 20 13:06:52 2016
> @@ -10,19 +10,17 @@ define zeroext i1 @search(i32 %needle, i
>  ; CHECK-NEXT:    testl %edx, %edx
>  ; CHECK-NEXT:    jle LBB0_1
>  ; CHECK-NEXT:  ## BB#4: ## %for.body.preheader
> -; CHECK-NEXT:    movslq %edx, %rcx
> -; CHECK-NEXT:    xorl %edx, %edx
> +; CHECK-NEXT:    movslq %edx, %rax
> +; CHECK-NEXT:    xorl %ecx, %ecx
>  ; CHECK-NEXT:    .p2align 4, 0x90
>  ; CHECK-NEXT:  LBB0_5: ## %for.body
>  ; CHECK-NEXT:    ## =>This Inner Loop Header: Depth=1
> -;            ### FIXME: This loop invariant should be hoisted
> -; CHECK-NEXT:    movb $1, %al
> -; CHECK-NEXT:    cmpl %edi, (%rsi,%rdx,4)
> +; CHECK-NEXT:    cmpl %edi, (%rsi,%rcx,4)
>  ; CHECK-NEXT:    je LBB0_6
>  ; CHECK-NEXT:  ## BB#2: ## %for.cond
>  ; CHECK-NEXT:    ## in Loop: Header=BB0_5 Depth=1
> -; CHECK-NEXT:    incq %rdx
> -; CHECK-NEXT:    cmpq %rcx, %rdx
> +; CHECK-NEXT:    incq %rcx
> +; CHECK-NEXT:    cmpq %rax, %rcx
>  ; CHECK-NEXT:    jl LBB0_5
>  ;            ### FIXME: BB#3 and LBB0_1 should be merged
>  ; CHECK-NEXT:  ## BB#3:
> @@ -33,7 +31,8 @@ define zeroext i1 @search(i32 %needle, i
>  ; CHECK-NEXT:    xorl %eax, %eax
>  ; CHECK-NEXT:    ## kill: %AL<def> %AL<kill> %EAX<kill>
>  ; CHECK-NEXT:    retq
> -; CHECK-NEXT:  LBB0_6: ## %cleanup
> +; CHECK-NEXT:  LBB0_6:
> +; CHECK-NEXT:    movb $1, %al
>  ; CHECK-NEXT:    ## kill: %AL<def> %AL<kill> %EAX<kill>
>  ; CHECK-NEXT:    retq
>  ;
> 
> Added: llvm/trunk/test/CodeGen/X86/machine-sink.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/machine-sink.ll?rev=284757&view=auto
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/X86/machine-sink.ll (added)
> +++ llvm/trunk/test/CodeGen/X86/machine-sink.ll Thu Oct 20 13:06:52 2016
> @@ -0,0 +1,21 @@
> +; RUN: llc < %s -mtriple=x86_64-pc-linux | FileCheck %s
> +
> +; Checks if movl $1 is sinked to critical edge.
> +; CHECK-NOT: movl $1
> +; CHECK: jbe
> +; CHECK: movl $1
> +define i32 @test(i32 %n, i32 %k) nounwind  {
> +entry:
> +  %cmp = icmp ugt i32 %k, %n
> +  br i1 %cmp, label %ifthen, label %ifend, !prof !1
> +
> +ifthen:
> +  %y = add i32 %k, 2
> +  br label %ifend
> +
> +ifend:
> +  %ret = phi i32 [ 1, %entry ] , [ %y, %ifthen]
> +  ret i32 %ret
> +}
> +
> +!1 = !{!"branch_weights", i32 100, i32 1}
> 
> Modified: llvm/trunk/test/CodeGen/X86/phys_subreg_coalesce-2.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/phys_subreg_coalesce-
> 2.ll?rev=284757&r1=284756&r2=284757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/X86/phys_subreg_coalesce-2.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/phys_subreg_coalesce-2.ll Thu Oct 20
> 13:06:52 2016
> @@ -14,7 +14,9 @@ forcond.preheader:		; preds = %entry
>  ifthen:		; preds = %entry
>  	ret i32 0
>  ; CHECK: forbody{{$}}
> +; There should be no mov instruction in the for body.
>  ; CHECK-NOT: mov
> +; CHECK: jbe
>  forbody:		; preds = %forbody, %forcond.preheader
>  	%indvar = phi i32 [ 0, %forcond.preheader ], [ %divisor.02,
> %forbody ]		; <i32> [#uses=3]
>  	%accumulator.01 = phi i32 [ 1, %forcond.preheader ], [ %div,
> %forbody ]		; <i32> [#uses=1]
> 
> Modified: llvm/trunk/test/CodeGen/X86/pr2659.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/CodeGen/X86/pr2659.ll?rev=284757&r1=284756&r2=28
> 4757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/CodeGen/X86/pr2659.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/pr2659.ll Thu Oct 20 13:06:52 2016
> @@ -14,7 +14,7 @@ forcond.preheader:              ; preds
>    br i1 %cmp44, label %afterfor, label %forbody
> 
>  ; CHECK: %forcond.preheader
> -; CHECK: movl $1
> +; CHECK: testl
>  ; CHECK-NOT: xorl
>  ; CHECK-NOT: movl
>  ; CHECK-NOT: LBB
> @@ -24,6 +24,7 @@ forcond.preheader:              ; preds
>  ; CHECK: %forbody{{$}}
>  ; CHECK-NOT: mov
>  ; CHECK: jbe
> +; CHECK: movl $1
> 
>  ifthen:         ; preds = %entry
>    ret i32 0
> 
> Modified: llvm/trunk/test/DebugInfo/COFF/pieces.ll
> URL: http://llvm.org/viewvc/llvm-
> project/llvm/trunk/test/DebugInfo/COFF/pieces.ll?rev=284757&r1=284756&r2
> =284757&view=diff
> ========================================================================
> ======
> --- llvm/trunk/test/DebugInfo/COFF/pieces.ll (original)
> +++ llvm/trunk/test/DebugInfo/COFF/pieces.ll Thu Oct 20 13:06:52 2016
> @@ -37,11 +37,11 @@
>  ; ASM-LABEL: loop_csr: # @loop_csr
>  ; ASM:        #DEBUG_VALUE: loop_csr:o [bit_piece offset=0 size=32] <-
> 0
>  ; ASM:        #DEBUG_VALUE: loop_csr:o [bit_piece offset=32 size=32] <-
> 0
> -; ASM: # BB#1:                                 # %for.body.preheader
> +; ASM: # BB#2:                                 # %for.body.preheader
>  ; ASM:         xorl    %edi, %edi
>  ; ASM:         xorl    %esi, %esi
>  ; ASM:         .p2align        4, 0x90
> -; ASM: .LBB0_2:                                # %for.body
> +; ASM: .LBB0_3:                                # %for.body
>  ; ASM: [[ox_start:\.Ltmp[0-9]+]]:
>  ; ASM:        #DEBUG_VALUE: loop_csr:o [bit_piece offset=0 size=32] <-
> %EDI
>  ; ASM:        .cv_loc 0 1 13 11               # t.c:13:11
> @@ -57,7 +57,7 @@
>  ; ASM:         movl    %eax, %esi
>  ; ASM:         #DEBUG_VALUE: loop_csr:o [bit_piece offset=32 size=32]
> <- %ESI
>  ; ASM:         cmpl    n(%rip), %eax
> -; ASM:         jl      .LBB0_2
> +; ASM:         jl      .LBB0_3
>  ; ASM: [[oy_end:\.Ltmp[0-9]+]]:
>  ; ASM:         addl    %edi, %esi
>  ; ASM:         movl    %esi, %eax
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


More information about the llvm-commits mailing list