[PATCH] Remove redundent register mov by improving TwoAddressInstructionPass

Wei Mi wmi at google.com
Fri Feb 20 22:18:58 PST 2015

Hi chandlerc, bob.wilson,


We found a problem in TwoAddressInstructionPass which could generate redundent register mov insns in loop, and proposed a patch to fix it.

Here is the small testcase:

int M, total;

void foo() {
  int i;
  for (i = 0; i < M; i++) {
    total = total + i / 2;

~/workarea/llvm-r230041/build/bin/clang -O2 -fno-vectorize -fno-unroll-loops -S 1.c

This is the kernel loop in 1.s:
.LBB0_2:                                # %for.body
                                        # =>This Inner Loop Header: Depth=1
        movl    %edx, %esi
        movl    %ecx, %edx
        shrl    $31, %edx
        addl    %ecx, %edx
        sarl    %edx
        addl    %esi, %edx
        incl    %ecx
        cmpl    %eax, %ecx
        jl      .LBB0_2

The first mov insn "movl    %edx, %esi" could be removed if we change "addl    %esi, %edx" to "addl    %edx, %esi".

The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body
    Predecessors according to CFG: BB#1 BB#2
        %vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
        %vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
        %vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
        %vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
        %vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
        %vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
        %vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
        CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
        %vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
        %vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
        JL_4 <BB#2>, %EFLAGS<imp-use,kill>

Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop. 

The patch fixed the problem and improved O2 performance of google internal benchmarks by 0.74% on average (The biggest improvement for a benchmark is 5%)





-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7806.20453.patch
Type: text/x-patch
Size: 4277 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150221/afbfd575/attachment.bin>

More information about the llvm-commits mailing list