[llvm-bugs] [Bug 52358] New: Missed vectorization in SPEC benchmark - emulated gather capability?

Fri Oct 29 19:18:09 PDT 2021

https://bugs.llvm.org/show_bug.cgi?id=52358

            Bug ID: 52358
           Summary: Missed vectorization in SPEC benchmark - emulated
                    gather capability?
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: david.bolvansky at gmail.com
                CC: llvm-bugs at lists.llvm.org

Extracted from 450.soplex:

typedef double Real;

struct Element {
    Real val;
    int idx;
};

struct Vector {
    int dimen;
    Real* val;
};

Real foo(Element* e, int n, const Vector& w) {
    Real x = 0;

    while (n--) {
        x += e->val * w.val[e->idx];
        e++;
    }
    return x;
}

Flags: -Ofast -mavx2

Godbolt: https://godbolt.org/z/7f9nsf98f

LLVM just unrolls this loop:
.LBB0_7:                                # =>This Inner Loop Header: Depth=1
        movsxd  rcx, dword ptr [rdi + 8]
        vmovsd  xmm1, qword ptr [rax + 8*rcx]   # xmm1 = mem[0],zero
        vmulsd  xmm1, xmm1, qword ptr [rdi]
        vaddsd  xmm0, xmm1, xmm0
        movsxd  rcx, dword ptr [rdi + 24]
        vmovsd  xmm1, qword ptr [rax + 8*rcx]   # xmm1 = mem[0],zero
        vmulsd  xmm1, xmm1, qword ptr [rdi + 16]
        movsxd  rcx, dword ptr [rdi + 40]
        vmovsd  xmm2, qword ptr [rax + 8*rcx]   # xmm2 = mem[0],zero
        vmulsd  xmm2, xmm2, qword ptr [rdi + 32]
        vaddsd  xmm1, xmm1, xmm2
        movsxd  rcx, dword ptr [rdi + 56]
        vmovsd  xmm2, qword ptr [rax + 8*rcx]   # xmm2 = mem[0],zero
        vmulsd  xmm2, xmm2, qword ptr [rdi + 48]
        vaddsd  xmm0, xmm0, xmm1
        add     rdi, 64
        vaddsd  xmm0, xmm2, xmm0
        add     esi, -4
        jne     .LBB0_7

New GCC can vectorize it since this commit:

    Add emulated gather capability to the vectorizer

    This adds a gather vectorization capability to the vectorizer
    without target support by decomposing the offset vector, doing
    sclar loads and then building a vector from the result.  This
    is aimed mainly at cases where vectorizing the rest of the loop
    offsets the cost of vectorizing the gather.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20211030/02032a13/attachment.html>