[llvm-bugs] [Bug 32085] New: Extra broadcasts in doubly-unrolled avx2 memcpy loop

via llvm-bugs llvm-bugs at lists.llvm.org
Mon Feb 27 22:18:30 PST 2017


http://bugs.llvm.org/show_bug.cgi?id=32085

            Bug ID: 32085
           Summary: Extra broadcasts in doubly-unrolled avx2 memcpy loop
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: justin.lebar at gmail.com
                CC: llvm-bugs at lists.llvm.org

At clang head

$ echo '#include <cstring>
void* go5(int val) {
  int* arr = new int[8 * 128];
  for (int i = 0; i < 8; i++) {
    for (int j = 0; j < 128; j++) {
      memcpy(&arr[i * 128 + j], &val, sizeof(int));
    }
  }
  return arr;
}' |  clang++ -O2 -x c++ -g0 --std=c++11 -mavx2 - -o - -S -mllvm
--x86-asm-syntax=intel

Output: https://gist.github.com/da5e8e50ba43cf1600ac652b35fd6746

LLVM unrolls both loops, but at the beginning of each iteration of the outer
loop, we re-broadcast into our ymm register.

        vmovd   xmm0, ebx
        vbroadcastss    ymm0, xmm0
        vmovups ymmword ptr [rax], ymm0
        vmovups ymmword ptr [rax + 32], ymm0
        [...]
        vmovd   xmm0, ebx
        vbroadcastss    ymm0, xmm0
        vmovups ymmword ptr [rax + 512], ymm0
        vmovups ymmword ptr [rax + 544], ymm0
        [...]

We shouldn't need to do this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20170228/83c67b62/attachment.html>


More information about the llvm-bugs mailing list