[llvm-bugs] [Bug 35047] New: load merging for (data[0]<<0) | (data[1]<<8) | ... endian agnostic load goes berserk with AVX2 variable-shift
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Oct 23 17:10:56 PDT 2017
https://bugs.llvm.org/show_bug.cgi?id=35047
Bug ID: 35047
Summary: load merging for (data[0]<<0) | (data[1]<<8) | ...
endian agnostic load goes berserk with AVX2
variable-shift
Product: new-bugs
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Keywords: performance
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: peter at cordes.ca
CC: llvm-bugs at lists.llvm.org
unsigned load_le32(unsigned char *data) {
unsigned le32 = (data[0]<<0) | (data[1]<<8) | (data[2]<<16) |
(data[3]<<24);
return le32;
}
// https://godbolt.org/g/X8i1pr
clang 6.0.0 (trunk 316311) -O3 -march=haswell -mno-avx
movl (%rdi), %eax
retq
-O3 -march=haswell (with AVX2)
.LCPI0_0:
.quad 16 # 0x10
.quad 24 # 0x18
load_le32: # @load_le32
movzbl (%rdi), %eax
movzbl 1(%rdi), %ecx
shll $8, %ecx
vpmovzxbq 2(%rdi), %xmm0 # xmm0 =
mem[0],zero,zero,zero,zero,zero,zero,zero,mem[1],zero,zero,zero,zero,zero,zero,zero
orl %eax, %ecx
vpsllvq .LCPI0_0(%rip), %xmm0, %xmm0
vmovd %xmm0, %edx
vpextrd $2, %xmm0, %eax
orl %edx, %eax
orl %ecx, %eax
retq
So if vpsllvq is available, clang uses it and doesn't notice that it could have
coalesced the loads into one. -fno-vectorize doesn't block this. (And if the
shift counts didn't line up this way, it's quite poorly vectorized. VPMOVZXBD
would have worked, then do 4 shifts, and then a horizontal reduction with OR,
using the same pattern as a horizontal sum. e.g. vpunpckhqdq / vpor / vmovq /
rorx $32, %rax, %rdx / or %edx, %eax)
(And BTW, for Haswell and later, movb 1(%rdi), %al merges into RAX without
stalling at all. It's a single micro-fused load+merge uop, so it's better than
a separate movzx load + OR instruction. See
https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to)
clang 4.0.1 doesn't merge the loads.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20171024/46a82e59/attachment.html>
More information about the llvm-bugs
mailing list