[llvm-bugs] [Bug 44781] New: Unnecessary use of SIMD

Tue Feb 4 12:51:11 PST 2020

https://bugs.llvm.org/show_bug.cgi?id=44781

            Bug ID: 44781
           Summary: Unnecessary use of SIMD
           Product: clang
           Version: unspecified
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: -New Bugs
          Assignee: unassignedclangbugs at nondot.org
          Reporter: kobalicek.petr at gmail.com
                CC: htmldeveloper at gmail.com, llvm-bugs at lists.llvm.org,
                    neeilans at live.com, richard-llvm at metafoo.co.uk

Clang tries sometimes so hard to use SIMD that it generates worse code than
actually not using it. A very simple example below:

struct Box {
    int x0, y0, x1, y1;
};

bool check(const Box& box) noexcept {
  return ((box.x0 | box.y0 | box.x1 | box.y1) & 0xFFu) == 0;
}

Clang 8+ compiles this code to:

check(Box const&):
        vmovdqu xmm0, xmmword ptr [rdi]
        vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
        vpor    xmm0, xmm0, xmm1
        vpshufd xmm1, xmm0, 229         # xmm1 = xmm0[1,1,2,3]
        vpor    xmm0, xmm0, xmm1
        vpextrb eax, xmm0, 0
        test    al, al
        sete    al
        ret

Whereas GCC (and Clang 7 and lesser) is just really fine with a scalar code:

check(Box const&):
        mov     eax, DWORD PTR [rdi]
        or      eax, DWORD PTR [rdi+4]
        or      eax, DWORD PTR [rdi+8]
        or      eax, DWORD PTR [rdi+12]
        test    al, al
        sete    al
        ret

The resulting SIMD code Clang produces is just worse (and bigger) than the
scalar one, and I think it would also be slower as it contains shuffles and
extract.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200204/aae9cd9d/attachment-0001.html>