[llvm-bugs] [Bug 32085] New: Extra broadcasts in doubly-unrolled avx2 memcpy loop
via llvm-bugs
llvm-bugs at lists.llvm.org
Mon Feb 27 22:18:30 PST 2017
http://bugs.llvm.org/show_bug.cgi?id=32085
Bug ID: 32085
Summary: Extra broadcasts in doubly-unrolled avx2 memcpy loop
Product: libraries
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: enhancement
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: justin.lebar at gmail.com
CC: llvm-bugs at lists.llvm.org
At clang head
$ echo '#include <cstring>
void* go5(int val) {
int* arr = new int[8 * 128];
for (int i = 0; i < 8; i++) {
for (int j = 0; j < 128; j++) {
memcpy(&arr[i * 128 + j], &val, sizeof(int));
}
}
return arr;
}' | clang++ -O2 -x c++ -g0 --std=c++11 -mavx2 - -o - -S -mllvm
--x86-asm-syntax=intel
Output: https://gist.github.com/da5e8e50ba43cf1600ac652b35fd6746
LLVM unrolls both loops, but at the beginning of each iteration of the outer
loop, we re-broadcast into our ymm register.
vmovd xmm0, ebx
vbroadcastss ymm0, xmm0
vmovups ymmword ptr [rax], ymm0
vmovups ymmword ptr [rax + 32], ymm0
[...]
vmovd xmm0, ebx
vbroadcastss ymm0, xmm0
vmovups ymmword ptr [rax + 512], ymm0
vmovups ymmword ptr [rax + 544], ymm0
[...]
We shouldn't need to do this.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170228/83c67b62/attachment.html>
More information about the llvm-bugs
mailing list