[llvm-bugs] [Bug 30787] New: Failure to beneficially vectorize 'copyable' elements in integer binary ops

Tue Oct 25 11:13:00 PDT 2016

https://llvm.org/bugs/show_bug.cgi?id=30787

            Bug ID: 30787
           Summary: Failure to beneficially vectorize 'copyable' elements
                    in integer binary ops
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: normal
          Priority: P
         Component: new bugs
          Assignee: unassignedbugs at nondot.org
          Reporter: llvm-dev at redking.me.uk
                CC: a.bataev at hotmail.com, andrew.v.tischenko at gmail.com,
                    llvm-bugs at lists.llvm.org, mkuper at google.com,
                    spatel+llvm at rotateright.com
    Classification: Unclassified

We successfully vectorize:

// clang -O3 -march=btver2

void add0(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}

add0(int*, int const*):
        vmovdqu xmm0, xmmword ptr [rsi]
        vpaddd  xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
        vmovdqu xmmword ptr [rdi], xmm0
        ret

But fail to do so if one or more elements simplify to a simple copy. If the
cost model says it would still be beneficial we should vectorize such cases:

void add1(int * __restrict dst, const int * __restrict src) {
  *dst++ = *src++ + 0;
  *dst++ = *src++ + 1;
  *dst++ = *src++ + 2;
  *dst++ = *src++ + 3;
}

add1(int*, int const*):
        mov     ecx, dword ptr [rsi + 4]
        mov     eax, dword ptr [rsi]
        mov     edx, dword ptr [rsi + 8]
        inc     ecx
        mov     dword ptr [rdi], eax
        add     edx, 2
        mov     dword ptr [rdi + 4], ecx
        mov     ecx, dword ptr [rsi + 12]
        mov     dword ptr [rdi + 8], edx
        add     ecx, 3
        mov     dword ptr [rdi + 12], ecx
        ret

Similarly for SUB/MUL/SHL/LSHR/ASHR (DIV/REM?). Possibly for -ffast-math float
FADD/FSUB/FMUL/FDIV operations as well.

Further examples can be found here: https://godbolt.org/g/ueIYiF

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20161025/013ff172/attachment.html>