[LLVMbugs] [Bug 21371] New: Improve instruction selection for addition reductions

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Fri Oct 24 06:54:47 PDT 2014


http://llvm.org/bugs/show_bug.cgi?id=21371

            Bug ID: 21371
           Summary: Improve instruction selection for addition reductions
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: AArch64
          Assignee: unassignedbugs at nondot.org
          Reporter: james.molloy at arm.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

When doing a simple "add" reduction, the code generated to do the final
"reduce" step is sub-optimal. The cleanup code needs to add all elements of a
vector together and return the result.

The following code:

int arr[5000];

int f() {
  int a = 0;
  for (int i = 0; i < 5000; ++i)
    a += arr[i];
  return a;
}

Generates the following reduction cleanup (this is outside the loop):

// BB#2:                                // %middle.block
    add    v0.4s, v1.4s, v0.4s         // add two reduction vectors together
(UF=2)
    ext    v1.16b, v0.16b, v0.16b, #8  // v1 = high half of $v0 (high half of
v1 and v0 unused after this point)
    add    v0.4s, v0.4s, v1.4s         // v0 = high half of $v0 + low half of
$v0
    dup    v1.4s, v0.s[1]              // v1[0] = 1'st element of v0
    add    v0.4s, v0.4s, v1.4s         // v0[0] = zero'th element of v0 + 1'st
element of v0
    fmov    w0, s0                      // Swap back to GPRs.
    ret

Note that the first ADD is adding two reduction vectors together, as the unroll
factor was 2. I've used the notation "$v0" to refer to the value of "v0" at the
beginning of the sequence.

This could be simplified to:
// BB#2:                                // %middle.block
    add    v0.4s, v0.4s, v1.4s         // Add two reduction vectors together
(UF=2)
    addv    s0, v0.4s                   // Cross-lane reduce.
        fmov    w0, s0                      // Swap back to GPRs
    ret

This reduces 4 instructions to 1, and would have a bigger impact when reducing
sub-integer vectors (vectors with more than 4 elements).
Other instructions that could be useful for reductions:
  * ADDV: integer add reduce across vector.
  * FMAXNMV: Floating point maxnum across vector.
  * FMAXV: Floating point maximum across vector.
  * FMINNMV: Floating point minnum across vector.
  * FMINV: Floating point minimum across vector.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141024/819c009a/attachment.html>


More information about the llvm-bugs mailing list