[llvm-bugs] [Bug 42550] New: Byte+shift loads are not autovectorized to movdqu on SSE4.1

Mon Jul 8 18:41:07 PDT 2019

https://bugs.llvm.org/show_bug.cgi?id=42550

            Bug ID: 42550
           Summary: Byte+shift loads are not autovectorized to movdqu on
                    SSE4.1
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: husseydevin at gmail.com
                CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
                    llvm-dev at redking.me.uk, spatel+llvm at rotateright.com

Take the following code:

#ifdef BYTE_SHIFT
static uint32_t read32(uint8_t const *data, size_t offset)
{
    return (uint32_t) data[offset + 0]
        | ((uint32_t) data[offset + 1] << 8)
        | ((uint32_t) data[offset + 2] << 16)
        | ((uint32_t) data[offset + 3] << 24);
}
#else
static uint32_t read32(uint8_t const *data, size_t offset)
{
    uint32_t ret;
    memcpy(&ret, data + offset, sizeof(float));
    return ret;
}
#endif

Both ways are perfectly valid ways to perform an unaligned load, and when SSE
is disabled, they generate the same code when used. 

However, when SSE4 is enabled and loops using these loads are autovectorized,
instead of a movdqu which is what memcpy outputs, the byte shift is expanded
literally into a bunch of pslld, pinsrb, and pmovzxbd instructions.

Demo: https://godbolt.org/z/jCAm2o

These types of loads should be converted to movdqu as well.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190709/b1358076/attachment.html>