[llvm-bugs] [Bug 42550] New: Byte+shift loads are not autovectorized to movdqu on SSE4.1
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Jul 8 18:41:07 PDT 2019
https://bugs.llvm.org/show_bug.cgi?id=42550
Bug ID: 42550
Summary: Byte+shift loads are not autovectorized to movdqu on
SSE4.1
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: husseydevin at gmail.com
CC: craig.topper at gmail.com, llvm-bugs at lists.llvm.org,
llvm-dev at redking.me.uk, spatel+llvm at rotateright.com
Take the following code:
#ifdef BYTE_SHIFT
static uint32_t read32(uint8_t const *data, size_t offset)
{
return (uint32_t) data[offset + 0]
| ((uint32_t) data[offset + 1] << 8)
| ((uint32_t) data[offset + 2] << 16)
| ((uint32_t) data[offset + 3] << 24);
}
#else
static uint32_t read32(uint8_t const *data, size_t offset)
{
uint32_t ret;
memcpy(&ret, data + offset, sizeof(float));
return ret;
}
#endif
Both ways are perfectly valid ways to perform an unaligned load, and when SSE
is disabled, they generate the same code when used.
However, when SSE4 is enabled and loops using these loads are autovectorized,
instead of a movdqu which is what memcpy outputs, the byte shift is expanded
literally into a bunch of pslld, pinsrb, and pmovzxbd instructions.
Demo: https://godbolt.org/z/jCAm2o
These types of loads should be converted to movdqu as well.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190709/b1358076/attachment.html>
More information about the llvm-bugs
mailing list