[PATCH] Remove redundent register mov by improving TwoAddressInstructionPass
Wei Mi
wmi at google.com
Fri Feb 20 22:18:58 PST 2015
Hi chandlerc, bob.wilson,
Hi,
We found a problem in TwoAddressInstructionPass which could generate redundent register mov insns in loop, and proposed a patch to fix it.
Here is the small testcase:
1.c:
--------------------------
int M, total;
void foo() {
int i;
for (i = 0; i < M; i++) {
total = total + i / 2;
}
}
--------------------------
~/workarea/llvm-r230041/build/bin/clang -O2 -fno-vectorize -fno-unroll-loops -S 1.c
This is the kernel loop in 1.s:
--------------------------
.LBB0_2: # %for.body
# =>This Inner Loop Header: Depth=1
movl %edx, %esi
movl %ecx, %edx
shrl $31, %edx
addl %ecx, %edx
sarl %edx
addl %esi, %edx
incl %ecx
cmpl %eax, %ecx
jl .LBB0_2
--------------------------
The first mov insn "movl %edx, %esi" could be removed if we change "addl %esi, %edx" to "addl %edx, %esi".
The IR before TwoAddressInstructionPass is:
BB#2: derived from LLVM BB %for.body
Predecessors according to CFG: BB#1 BB#2
%vreg3<def> = COPY %vreg12<kill>; GR32:%vreg3,%vreg12
%vreg2<def> = COPY %vreg11<kill>; GR32:%vreg2,%vreg11
%vreg7<def,tied1> = SHR32ri %vreg3<tied0>, 31, %EFLAGS<imp-def,dead>; GR32:%vreg7,%vreg3
%vreg8<def,tied1> = ADD32rr %vreg3<tied0>, %vreg7<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg3,%vreg7
%vreg9<def,tied1> = SAR32r1 %vreg8<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg9,%vreg8
%vreg4<def,tied1> = ADD32rr %vreg9<kill,tied0>, %vreg2<kill>, %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg9,%vreg2
%vreg5<def,tied1> = INC64_32r %vreg3<kill,tied0>, %EFLAGS<imp-def,dead>; GR32:%vreg5,%vreg3
CMP32rr %vreg5, %vreg0, %EFLAGS<imp-def>; GR32:%vreg5,%vreg0
%vreg11<def> = COPY %vreg4; GR32:%vreg11,%vreg4
%vreg12<def> = COPY %vreg5<kill>; GR32:%vreg12,%vreg5
JL_4 <BB#2>, %EFLAGS<imp-use,kill>
Now TwoAddressInstructionPass will choose vreg9 to be tied with vreg4. However, it doesn't see that there is copy from vreg4 to vreg11 and another copy from vreg11 to vreg2 inside the loop body. To remove those copies, it is necessary to choose vreg2 to be tied with vreg4 instead of vreg9. This code pattern commonly appears when there is reduction operation in a loop.
The patch fixed the problem and improved O2 performance of google internal benchmarks by 0.74% on average (The biggest improvement for a benchmark is 5%)
Wei.
REPOSITORY
rL LLVM
http://reviews.llvm.org/D7806
Files:
lib/CodeGen/TwoAddressInstructionPass.cpp
test/CodeGen/X86/twoaddr-coalesce-3.ll
EMAIL PREFERENCES
http://reviews.llvm.org/settings/panel/emailpreferences/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D7806.20453.patch
Type: text/x-patch
Size: 4277 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150221/afbfd575/attachment.bin>
More information about the llvm-commits
mailing list