[LLVMbugs] [Bug 22074] New: Redundant copy in reduction loop

Wed Dec 31 06:25:32 PST 2014

http://llvm.org/bugs/show_bug.cgi?id=22074

            Bug ID: 22074
           Summary: Redundant copy in reduction loop
           Product: libraries
           Version: trunk
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Common Code Generator Code
          Assignee: unassignedbugs at nondot.org
          Reporter: michael.m.kuperstein at intel.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

The code below performs a trivial reduction on an array (basically, calculating
a checksum):

target triple = "i386-linux"
define zeroext i16 @loop(i16* readonly %p, i16* readnone %q) #0 {
entry:
  %cmp4 = icmp eq i16* %p, %q
  br i1 %cmp4, label %while.end, label %while.body

while.body:
  %v.06 = phi i32 [ %add, %while.body ], [ 0, %entry ]
  %p.addr.05 = phi i16* [ %incdec.ptr, %while.body ], [ %p, %entry ]
  %incdec.ptr = getelementptr inbounds i16* %p.addr.05, i32 1
  %0 = load i16* %p.addr.05, align 2
  %conv = zext i16 %0 to i32
  %add = add i32 %conv, %v.06
  %cmp = icmp eq i16* %incdec.ptr, %q
  br i1 %cmp, label %while.cond.while.end_crit_edge, label %while.body

while.cond.while.end_crit_edge:
  %phitmp = trunc i32 %add to i16
  br label %while.end

while.end:
  %v.0.lcssa = phi i16 [ %phitmp, %while.cond.while.end_crit_edge ], [ 0,
%entry ]
  ret i16 %v.0.lcssa
}

The code we currently get for the loop body is:
.LBB0_1:
        movl    %eax, %esi
        movzwl  (%edx), %eax
        addl    $2, %edx
        addl    %esi, %eax
        cmpl    %edx, %ecx
        jne     .LBB0_1
.LBB0_2:

Instead of the expected:
.LBB0_1:
        movzwl  (%edx), %esi
        addl    $2, %edx
        addl    %esi, %eax
        cmpl    %edx, %ecx
        jne     .LBB0_1
.LBB0_2:

The problem is that instead of adding the loaded value into the accumulator, we
do the reverse, forcing us to copy the accumulator between iterations.

There are three different things here that could go right instead of going
wrong:

1) The two-address instruction pass tries to guess when it's worth to commute
the instruction when doing the 3-addr -> 2-addr transformation. The heuristic
there is rather limited, and it doesn't fire in this case.

2) The register coalescer also tries to commute two-address instructions when
it runs into situations it considers profitable. 
Specifically, if it gets this:

BB#2:
  %vreg3<def> = MOVZX32rm16 %vreg15, 1, %noreg, 0, %noreg; mem:LD2[%p.addr.05]
GR32:%vreg3,%vreg15
  %vreg15<def,tied1> = ADD32ri8 %vreg15<tied0>, 2, %EFLAGS<imp-def,dead>;
GR32:%vreg15
  %vreg3<def,tied1> = ADD32rr %vreg3<tied0>, %vreg14, %EFLAGS<imp-def,dead>;
GR32:%vreg3,%vreg14
  CMP32rr %vreg7, %vreg15, %EFLAGS<imp-def>; GR32:%vreg7,%vreg15
  %vreg14<def> = COPY %vreg3; GR32:%vreg14,%vreg3
  JNE_4 <BB#2>, %EFLAGS<imp-use,kill>
  JMP_4 <BB#3>

Where it can't coalesce vreg14 and vreg3 because of the def of vreg3 in the
beginning of the loop, it will commute the ADD32rr that is the def of the
copied register, making the copy a nop.

However, what we actually get is:

BB#0:
[...]
   %vreg14<def> = MOV32r0 %EFLAGS<imp-def,dead>; GR32:%vreg14
[...]
BB#2:
  %vreg0<def> = COPY %vreg14; GR32:%vreg0,%vreg14
  %vreg14<def> = MOVZX32rm16 %vreg15, 1, %noreg, 0, %noreg; mem:LD2[%p.addr.05]
GR32:%vreg14,%vreg15
  %vreg15<def,tied1> = ADD32ri8 %vreg15<tied0>, 2, %EFLAGS<imp-def,dead>;
GR32:%vreg15
  %vreg14<def,tied1> = ADD32rr %vreg14<tied0>, %vreg0, %EFLAGS<imp-def,dead>;
GR32:%vreg14,%vreg0
  CMP32rr %vreg7, %vreg15, %EFLAGS<imp-def>; GR32:%vreg7,%vreg15
  JNE_4 <BB#2>, %EFLAGS<imp-use,kill>
  JMP_4 <BB#3>

The copy we have is the remnant of PHI elimination, and can see two defs of
vreg14. This pattern isn't caught by the current code, so we don't commute.

3) The reason we see the "wrong" copy has to do with the order register
coalescing tries to eliminate the copies. When eliminating the copies bottom
-up (-join-globalcopies=false) we end up with the copy the coalescer will try
to commute. However, with join-globalcopies enabled, we try to eliminate the
"outgoing" copy of %vreg14 first, and get stuck with the one we don't know how
to handle.

Any suggestions on where this should be fixed - (1), (2) or (3)? 
I'm a bit at a loss...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141231/04f11f88/attachment.html>