[LLVMbugs] [Bug 19370] New: [X86] Non-temporal store from _mm_stream_ps is not mapped to movntps in some cases
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Tue Apr 8 10:13:53 PDT 2014
http://llvm.org/bugs/show_bug.cgi?id=19370
Bug ID: 19370
Summary: [X86] Non-temporal store from _mm_stream_ps is not
mapped to movntps in some cases
Product: new-bugs
Version: trunk
Hardware: PC
OS: Linux
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: dario.domizioli at gmail.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Created attachment 12354
--> http://llvm.org/bugs/attachment.cgi?id=12354&action=edit
Test case and assembly output
We have encountered a situation where non-temporal stores (coming from a
_mm_stream_ps intrinsic) are not compiled to movntps, and they are instead
producing a standard movaps. In this example this happens when the value to
store is a constant.
I am attaching a tar.gz file containing:
- simplified.cpp
- simplified.ll
- simplified.s
which are, respectively:
- the source code
- the output of "clang -O2 -S -emit-llvm simplified.cpp -o simplified.ll"
- the output of "llc simplified.ll -o simplified.s"
The test case is reduced from a larger file, and I have removed things like
memory fences after the operations (they don't seem to affect the situation).
This is the smallest source I could get to. I am using Linux Ubuntu, the triple
is x86_64-unknown-linux-gnu.
There are two functions in the code. The first one uses a non-temporal store to
copy a value from a global to another. The second one uses a non-temporal store
to write a splat of zeroes (created with _mm_set1_ps) into the destination.
The first is compiled to this IR:
%0 = load <4 x float>* bitcast ([4 x float]* @src to <4 x float>*), align 32,
!tbaa !1
store <4 x float> %0, <4 x float>* bitcast ([4 x float]* @dest to <4 x
float>*), align 32, !nontemporal !4
ret void
And it generates the following asm (stripped):
vmovaps src(%rip), %xmm0
vmovntps %xmm0, dest(%rip)
retq
The second one is compiled to this IR:
store <4 x float> zeroinitializer, <4 x float>* bitcast ([4 x float]* @dest
to <4 x float>*), align 32, !nontemporal !4
ret void
And it generates the following asm (stripped):
vxorps %xmm0, %xmm0, %xmm0
vmovaps %xmm0, dest(%rip)
retq
Both stores in the IR have the "!nontemporal" flag, but ISEL seems to select a
different instruction in each case.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20140408/b660c97f/attachment.html>
More information about the llvm-bugs
mailing list