<html>
<head>
<base href="http://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - [X86] Non-temporal store from _mm_stream_ps is not mapped to movntps in some cases"
href="http://llvm.org/bugs/show_bug.cgi?id=19370">19370</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[X86] Non-temporal store from _mm_stream_ps is not mapped to movntps in some cases
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>dario.domizioli@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvmbugs@cs.uiuc.edu
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=12354" name="attach_12354" title="Test case and assembly output">attachment 12354</a> <a href="attachment.cgi?id=12354&action=edit" title="Test case and assembly output">[details]</a></span>
Test case and assembly output
We have encountered a situation where non-temporal stores (coming from a
_mm_stream_ps intrinsic) are not compiled to movntps, and they are instead
producing a standard movaps. In this example this happens when the value to
store is a constant.
I am attaching a tar.gz file containing:
- simplified.cpp
- simplified.ll
- simplified.s
which are, respectively:
- the source code
- the output of "clang -O2 -S -emit-llvm simplified.cpp -o simplified.ll"
- the output of "llc simplified.ll -o simplified.s"
The test case is reduced from a larger file, and I have removed things like
memory fences after the operations (they don't seem to affect the situation).
This is the smallest source I could get to. I am using Linux Ubuntu, the triple
is x86_64-unknown-linux-gnu.
There are two functions in the code. The first one uses a non-temporal store to
copy a value from a global to another. The second one uses a non-temporal store
to write a splat of zeroes (created with _mm_set1_ps) into the destination.
The first is compiled to this IR:
%0 = load <4 x float>* bitcast ([4 x float]* @src to <4 x float>*), align 32,
!tbaa !1
store <4 x float> %0, <4 x float>* bitcast ([4 x float]* @dest to <4 x
float>*), align 32, !nontemporal !4
ret void
And it generates the following asm (stripped):
vmovaps src(%rip), %xmm0
vmovntps %xmm0, dest(%rip)
retq
The second one is compiled to this IR:
store <4 x float> zeroinitializer, <4 x float>* bitcast ([4 x float]* @dest
to <4 x float>*), align 32, !nontemporal !4
ret void
And it generates the following asm (stripped):
vxorps %xmm0, %xmm0, %xmm0
vmovaps %xmm0, dest(%rip)
retq
Both stores in the IR have the "!nontemporal" flag, but ISEL seems to select a
different instruction in each case.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>