[llvm-dev] [InstCombine] Simplification sometimes only transforms but doesn't simplify instruction, causing side effect in other pass

Wed Aug 2 16:36:23 PDT 2017

On Wed, Aug 2, 2017 at 3:36 PM, Matthias Braun <mbraun at apple.com> wrote:

> So to write this in a more condensed form, you have:
>
> %v0 = ...
> %v1 = and %v0, 255
> %v2 = and %v1, 31
> use %v1
> use %v2
>
> and transform this to
> %v0 = ...
> %v1 = and %v0, 255
> %v2 = and %v0, 31
> ...
>
> This is a classical problem with instruction combining. When you replace a
> non-root node of a pattern (in this case %v2 would be the root pattern we
> match, but we actually replace %v1 with %v0 as part of the replacement) and
> that non-root node has multiple users then it becomes very hard to predict
> whether the transformations pays off in the end.
>
> In my experience it is best to restrict those sort of transformation to
> situations where you can prove there aren't multiple users on the
> intermediate/non-root node that we replace. (Or if you want to get fancier
> that all users
> will match the same pattern).
>

I agree. The propagation happens in the hope that v1 has only one use so
that '%v1 = ..' can be later eliminated, or that v1 has only two uses such
that the propagation can shrink the live range of %v1, saving one move
instruction for the other use of %v1.   In  reality, this is like flip a
coin and likely to hurt performance on x86.

Avoid doing this one the fly, but in a separate pass might be the way to go.

David

>
> - Matthias
>
> > On Aug 2, 2017, at 3:00 PM, Wei Mi via llvm-dev <llvm-dev at lists.llvm.org>
> wrote:
> >
> > Hi,
> >
> > We recently found a testcase showing that simplifications in
> > instcombine sometimes change the instruction without reducing the
> > instruction cost,  but causing problems in TwoAddressInstruction pass.
> > And it looks like the problem is generic and other simplification may
> > have the same issue. I want to get some ideas about what is the best
> > way to fix such kind of problem.
> >
> > The testcase:
> > ----------------------------- a.ll ----------------------------------
> > @a = global i64 0, align 8
> > @b = global i32 0, align 8
> >
> > define i32 @_Z25InternalUncompressAllTagsv(i16* %arrayidx)
> local_unnamed_addr {
> > entry:
> >  %t1 = load i16, i16* %arrayidx, align 2
> >  %conv.i = zext i16 %t1 to i32
> >  %and.i = and i32 %conv.i, 255
> >  %and7.i = and i32 %conv.i, 1792
> >  %t3 = zext i32 %and7.i to i64
> >  %t4 = load i64, i64* @a, align 8
> >  %add.i = add i64 %t4, %t3
> >  %cmp1 = icmp eq i64 %add.i, 1
> >  %cmp2 = icmp ult i32 %and.i, 101
> >  %bool = and i1 %cmp1, %cmp2
> >  br i1 %bool, label %if.then, label %if.else, !prof !0
> >
> > if.then:                                          ; preds = %entry
> >  %r1 = trunc i64 %add.i to i32
> >  br label %return
> >
> > if.else:                                          ; preds = %entry
> >  %r2 = and i32 %and.i, 31
> >  store i32 %and.i, i32* @b, align 8
> >  br label %return
> >
> > return:                                           ; preds = %if.else,
> %if.then
> >  %ret = phi i32 [ %r1, %if.then ], [ %r2, %if.else ]
> >  ret i32 %ret
> > }
> >
> > !0 = !{!"branch_weights", i32 2000, i32 1}
> > ------------------------------------------------------------
> -------------
> >
> > For the snippet:
> > %and.i = and i32 %conv.i, 255
> >  ...
> > %r2 = and i32 %and.i, 31
> >
> > Look at %r2 in block %if.else, it is computed by two "and" operations.
> > Both InstCombiner::SimplifyAssociativeOrCommutative and
> > InstCombiner::SimplifyDemandedUseBits can replace %r2 = and i32
> > %and.i, 31 with %r2 = and i32 %conv.i, 31. Because %and.i has many
> > other uses and it won't become dead after the simplifications, those
> > simplifications won't simplify instruction cost, but they will change
> > the live range of variables:
> > Before instcombine, %conv.i is last used at %and7.i = and i32 %conv.i,
> > 1792, so on architecture like x86, llvm won't generate extra mov in
> > TwoAddressInstruction pass for it. After instcombine, %conv.i's live
> > range is extended, both %and7.i and %conv.i are alive after the "and"
> > instruction, so TwoAddressInstruction pass will generate an extra mov
> > for %and7.i = and i32 %conv.i.
> >
> > *** After instcombine: ***
> > ~/workarea/llvm-r309240/rbuild1/bin/opt -instcombine -S < a.ll > b.ll;
> cat b.ll
> > @a = global i64 0, align 8
> > @b = global i32 0, align 8
> >
> > define i32 @_Z25InternalUncompressAllTagsv(i16* %arrayidx)
> local_unnamed_addr {
> > entry:
> >  %t1 = load i16, i16* %arrayidx, align 2
> >  %conv.i = zext i16 %t1 to i32
> >  %and.i = and i32 %conv.i, 255
> >  %and7.i = and i32 %conv.i, 1792              ============> %conv.i
> > is alive since its live range is extended.
> >  %t3 = zext i32 %and7.i to i64
> >         We need a extra mov for this two address instruction on x86
> >  %t4 = load i64, i64* @a, align 8
> >  %add.i = add i64 %t4, %t3
> >  %cmp1 = icmp eq i64 %add.i, 1
> >  %cmp2 = icmp ult i32 %and.i, 101
> >  %bool = and i1 %cmp1, %cmp2
> >  br i1 %bool, label %if.then, label %if.else
> >
> > if.then:                                          ; preds = %entry
> >  %r1 = trunc i64 %add.i to i32
> >  br label %return
> >
> > if.else:                                          ; preds = %entry
> >  %r2 = and i32 %conv.i, 31
> > ===============> %and.i is replaced by %conv.i.
> >  store i32 %and.i, i32* @b, align 8
> >  br label %return
> >
> > return:                                           ; preds = %if.else,
> %if.then
> >  %ret = phi i32 [ %r1, %if.then ], [ %r2, %if.else ]
> >  ret i32 %ret
> > }
> >
> > *** asm code without instcombine: ***
> > ~/workarea/llvm-r309240/rbuild1/bin/llc < a.ll
> > # BB#0:                                 # %entry
> >        movzwl  (%rdi), %ecx
> >        movzbl  %cl, %eax
> >        andl    $1792, %ecx             # imm = 0x700
> >        addq    a(%rip), %rcx
> >        cmpq    $1, %rcx
> >        jne     .LBB0_3
> >
> > *** asm code with instcombine: ***
> > ~/workarea/llvm-r309240/rbuild1/bin/llc < b.ll
> > # BB#0:                                 # %entry
> >        movzwl  (%rdi), %eax
> >        movzbl  %al, %ecx
> >        movl    %eax, %edx                =====>  One extra move
> > instruction in the hot path
> >        andl    $1792, %edx             # imm = 0x700
> >        addq    a(%rip), %rdx
> >        cmpq    $1, %rdx
> >        jne     .LBB0_3
> >
> > I am not sure if it is possible to tackle the problem in
> > simplification or instcombine directly, so I explored the possibility
> > to do the reversal transformation in CodeGen, but I found that not
> > quite easy either because of two things:
> > * Seems to me Peephole pass or TwoAddrInstruction pass are the proper
> > places to do the reversal transformation. We need Live Interval
> > information to justify the transformation, however currently live
> > Interval information is not ready in both places.
> >
> > * The pattern matching looks quite ad hoc on machine IR. I need to
> > figure out we can replace %vreg0 in "AND32ri8 %vreg0<tied0>, 31" with
> > %vreg1 by looking at the copy chain starting from %vreg9<def> = COPY
> > %vreg0 to %vreg1<def> = MOVZX32rr8 %vreg9 first, and at the same time,
> > after replacing vreg0 with %vreg1, vreg0 becomes dead at the other
> > AND32ri and we can save an instruction there. In addition, replace
> > %vreg0 with %vreg1 may increase an extra move before "AND32ri8
> > %vreg0<tied0>, 31", so we still need to check "AND32ri8 %vreg0<tied0>,
> > 31" is colder than "AND32ri %vreg0<tied0>, 1792".
> >  All these efforts are just handling a specific pattern, and if the
> > pattern changes a little bit, they won't work.
> > BB_i:
> >        %vreg9<def> = COPY %vreg0:sub_8bit; GR8:%vreg9 GR32:%vreg0
> >        %vreg1<def> = MOVZX32rr8 %vreg9<kill>; GR32:%vreg1 GR8:%vreg9
> >        %vreg10<def,tied1> = AND32ri %vreg0<tied0>, 1792,
> > %EFLAGS<imp-def,dead>; GR32:%vreg10,%vreg0
> > ...
> > BB_j:
> >        %vreg4<def,tied1> = AND32ri8 %vreg0<tied0>, 31,
> > %EFLAGS<imp-def,dead>; GR32:%vreg4,%vreg0
> >        MOV32mr %RIP, 1, %noreg, <ga:@b>, %noreg, %vreg1;
> > mem:ST4[@b](align=8) GR32:%vreg1
> >
> > Any suggestions are welcomed.
> >
> > Thanks,
> > Wei.
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170802/d94f83d1/attachment.html>