[LLVMbugs] [Bug 22563] New: Incorrect code generation with arrays of __m256 variables
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Thu Feb 12 04:32:07 PST 2015
http://llvm.org/bugs/show_bug.cgi?id=22563
Bug ID: 22563
Summary: Incorrect code generation with arrays of __m256
variables
Product: new-bugs
Version: 3.6
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: jasonr at 3db-labs.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Created attachment 13854
--> http://llvm.org/bugs/attachment.cgi?id=13854&action=edit
C++ source file that demonstrates the issue
NOTE: This was originally posted on Stack Overflow[1]. After getting some
comcurrence that this is likely a clang/LLVM bug, I posted it here.
I'm encountering what appears to be a bug causing incorrect code generation
with clang 3.4, 3.5, and 3.6. The source that actually triggered the problem is
quite complicated, but I've been able to reduce it to a self-contained example
that is attached to this report.
A summary of the code: I have a simple type called `simd_pack` that contains
one member, an array of one `__m256i` value. In my application, there are
operators and functions that take these types, but the problem can be
illustrated by the above example. Specifically, `test_broken()` should read
from the `in1` array and then just copy its value over to the `out` array.
Therefore, the call to `memcmp()` in `main()` should return zero. I compile the
above using the following:
clang++-3.6 bug_test.cc -o bug_test -mavx -O3
I find that on optimization levels `-O0` and `-O1`, the test passes, while on
levels `-O2` and `-O3`, the test fails. I've tried compiling the same file with
gcc 4.4, 4.6, 4.7, and 4.8, as well as Intel C++ 13.0, and the test passes on
all optimization levels.
Taking a closer look at the generated code, here's the assembly generated on
optimization level `-O3`:
0000000000400a40 <test_broken(signed char*, signed char*, unsigned long)>:
400a40: 55 push %rbp
400a41: 48 89 e5 mov %rsp,%rbp
400a44: 48 81 e4 e0 ff ff ff and $0xffffffffffffffe0,%rsp
400a4b: 48 83 ec 40 sub $0x40,%rsp
400a4f: 48 83 fa 20 cmp $0x20,%rdx
400a53: 72 2f jb 400a84 <test_broken(signed
char*, signed char*, unsigned long)+0x44>
400a55: 31 c0 xor %eax,%eax
400a57: 66 0f 1f 84 00 00 00 nopw 0x0(%rax,%rax,1)
400a5e: 00 00
400a60: c5 fc 10 04 06 vmovups (%rsi,%rax,1),%ymm0
400a65: c5 f8 29 04 24 vmovaps %xmm0,(%rsp)
400a6a: c5 fc 28 04 24 vmovaps (%rsp),%ymm0
400a6f: c5 fc 11 04 07 vmovups %ymm0,(%rdi,%rax,1)
400a74: 48 8d 48 20 lea 0x20(%rax),%rcx
400a78: 48 83 c0 3f add $0x3f,%rax
400a7c: 48 39 d0 cmp %rdx,%rax
400a7f: 48 89 c8 mov %rcx,%rax
400a82: 72 dc jb 400a60 <test_broken(signed
char*, signed char*, unsigned long)+0x20>
400a84: 48 89 ec mov %rbp,%rsp
400a87: 5d pop %rbp
400a88: c5 f8 77 vzeroupper
400a8b: c3 retq
400a8c: 0f 1f 40 00 nopl 0x0(%rax)
I'll reproduce the key part for emphasis:
400a60: c5 fc 10 04 06 vmovups (%rsi,%rax,1),%ymm0
400a65: c5 f8 29 04 24 vmovaps %xmm0,(%rsp)
400a6a: c5 fc 28 04 24 vmovaps (%rsp),%ymm0
400a6f: c5 fc 11 04 07 vmovups %ymm0,(%rdi,%rax,1)
The generated code is strange. It first loads 256 bits into `ymm0` using the
unaligned move that I asked for, then it stores `xmm0` (which only contains the
lower 128 bits of the data that was read) to the stack, then immediately reads
256 bits into `ymm0` from the stack location that was just written to. The
effect is that `ymm0`'s upper 128 bits (which get written to the output buffer)
are garbage, causing the test to fail.
Are there any particular optimization steps that could be disabled to work
around this issue, or a different way to express my intent in code that might
not trigger it? I apologize for the lack of a reduced bitcode test case as
explained here[2], but I'm not familiar enough with the toolchain to drive the
tools properly.
[1]:
http://stackoverflow.com/questions/28462707/is-this-incorrect-code-generation-with-arrays-of-m256-values-a-clang-bug
[2]: http://llvm.org/docs/HowToSubmitABug.html
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150212/e044d67b/attachment.html>
More information about the llvm-bugs
mailing list