[llvm-dev] [InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass
Wei Mi via llvm-dev
llvm-dev at lists.llvm.org
Wed Aug 2 15:00:16 PDT 2017
Hi,
We recently found a testcase showing that simplifications in
instcombine sometimes change the instruction without reducing the
instruction cost, but causing problems in TwoAddressInstruction pass.
And it looks like the problem is generic and other simplification may
have the same issue. I want to get some ideas about what is the best
way to fix such kind of problem.
The testcase:
----------------------------- a.ll ----------------------------------
@a = global i64 0, align 8
@b = global i32 0, align 8
define i32 @_Z25InternalUncompressAllTagsv(i16* %arrayidx) local_unnamed_addr {
entry:
%t1 = load i16, i16* %arrayidx, align 2
%conv.i = zext i16 %t1 to i32
%and.i = and i32 %conv.i, 255
%and7.i = and i32 %conv.i, 1792
%t3 = zext i32 %and7.i to i64
%t4 = load i64, i64* @a, align 8
%add.i = add i64 %t4, %t3
%cmp1 = icmp eq i64 %add.i, 1
%cmp2 = icmp ult i32 %and.i, 101
%bool = and i1 %cmp1, %cmp2
br i1 %bool, label %if.then, label %if.else, !prof !0
if.then: ; preds = %entry
%r1 = trunc i64 %add.i to i32
br label %return
if.else: ; preds = %entry
%r2 = and i32 %and.i, 31
store i32 %and.i, i32* @b, align 8
br label %return
return: ; preds = %if.else, %if.then
%ret = phi i32 [ %r1, %if.then ], [ %r2, %if.else ]
ret i32 %ret
}
!0 = !{!"branch_weights", i32 2000, i32 1}
-------------------------------------------------------------------------
For the snippet:
%and.i = and i32 %conv.i, 255
...
%r2 = and i32 %and.i, 31
Look at %r2 in block %if.else, it is computed by two "and" operations.
Both InstCombiner::SimplifyAssociativeOrCommutative and
InstCombiner::SimplifyDemandedUseBits can replace %r2 = and i32
%and.i, 31 with %r2 = and i32 %conv.i, 31. Because %and.i has many
other uses and it won't become dead after the simplifications, those
simplifications won't simplify instruction cost, but they will change
the live range of variables:
Before instcombine, %conv.i is last used at %and7.i = and i32 %conv.i,
1792, so on architecture like x86, llvm won't generate extra mov in
TwoAddressInstruction pass for it. After instcombine, %conv.i's live
range is extended, both %and7.i and %conv.i are alive after the "and"
instruction, so TwoAddressInstruction pass will generate an extra mov
for %and7.i = and i32 %conv.i.
*** After instcombine: ***
~/workarea/llvm-r309240/rbuild1/bin/opt -instcombine -S < a.ll > b.ll; cat b.ll
@a = global i64 0, align 8
@b = global i32 0, align 8
define i32 @_Z25InternalUncompressAllTagsv(i16* %arrayidx) local_unnamed_addr {
entry:
%t1 = load i16, i16* %arrayidx, align 2
%conv.i = zext i16 %t1 to i32
%and.i = and i32 %conv.i, 255
%and7.i = and i32 %conv.i, 1792 ============> %conv.i
is alive since its live range is extended.
%t3 = zext i32 %and7.i to i64
We need a extra mov for this two address instruction on x86
%t4 = load i64, i64* @a, align 8
%add.i = add i64 %t4, %t3
%cmp1 = icmp eq i64 %add.i, 1
%cmp2 = icmp ult i32 %and.i, 101
%bool = and i1 %cmp1, %cmp2
br i1 %bool, label %if.then, label %if.else
if.then: ; preds = %entry
%r1 = trunc i64 %add.i to i32
br label %return
if.else: ; preds = %entry
%r2 = and i32 %conv.i, 31
===============> %and.i is replaced by %conv.i.
store i32 %and.i, i32* @b, align 8
br label %return
return: ; preds = %if.else, %if.then
%ret = phi i32 [ %r1, %if.then ], [ %r2, %if.else ]
ret i32 %ret
}
*** asm code without instcombine: ***
~/workarea/llvm-r309240/rbuild1/bin/llc < a.ll
# BB#0: # %entry
movzwl (%rdi), %ecx
movzbl %cl, %eax
andl $1792, %ecx # imm = 0x700
addq a(%rip), %rcx
cmpq $1, %rcx
jne .LBB0_3
*** asm code with instcombine: ***
~/workarea/llvm-r309240/rbuild1/bin/llc < b.ll
# BB#0: # %entry
movzwl (%rdi), %eax
movzbl %al, %ecx
movl %eax, %edx =====> One extra move
instruction in the hot path
andl $1792, %edx # imm = 0x700
addq a(%rip), %rdx
cmpq $1, %rdx
jne .LBB0_3
I am not sure if it is possible to tackle the problem in
simplification or instcombine directly, so I explored the possibility
to do the reversal transformation in CodeGen, but I found that not
quite easy either because of two things:
* Seems to me Peephole pass or TwoAddrInstruction pass are the proper
places to do the reversal transformation. We need Live Interval
information to justify the transformation, however currently live
Interval information is not ready in both places.
* The pattern matching looks quite ad hoc on machine IR. I need to
figure out we can replace %vreg0 in "AND32ri8 %vreg0<tied0>, 31" with
%vreg1 by looking at the copy chain starting from %vreg9<def> = COPY
%vreg0 to %vreg1<def> = MOVZX32rr8 %vreg9 first, and at the same time,
after replacing vreg0 with %vreg1, vreg0 becomes dead at the other
AND32ri and we can save an instruction there. In addition, replace
%vreg0 with %vreg1 may increase an extra move before "AND32ri8
%vreg0<tied0>, 31", so we still need to check "AND32ri8 %vreg0<tied0>,
31" is colder than "AND32ri %vreg0<tied0>, 1792".
All these efforts are just handling a specific pattern, and if the
pattern changes a little bit, they won't work.
BB_i:
%vreg9<def> = COPY %vreg0:sub_8bit; GR8:%vreg9 GR32:%vreg0
%vreg1<def> = MOVZX32rr8 %vreg9<kill>; GR32:%vreg1 GR8:%vreg9
%vreg10<def,tied1> = AND32ri %vreg0<tied0>, 1792,
%EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
...
BB_j:
%vreg4<def,tied1> = AND32ri8 %vreg0<tied0>, 31,
%EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg0
MOV32mr %RIP, 1, %noreg, <ga:@b>, %noreg, %vreg1;
mem:ST4[@b](align=8) GR32:%vreg1
Any suggestions are welcomed.
Thanks,
Wei.
More information about the llvm-dev
mailing list