<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [SelectionDAG] MergeConsecutiveStores loses non-temporal flag"
href="https://bugs.llvm.org/show_bug.cgi?id=42123">42123</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[SelectionDAG] MergeConsecutiveStores loses non-temporal flag
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Common Code Generator Code
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>llvm-dev@redking.me.uk
</td>
</tr>
<tr>
<th>CC</th>
<td>andrea.dibiagio@gmail.com, craig.topper@gmail.com, hfinkel@anl.gov, llvm-bugs@lists.llvm.org, spatel+llvm@rotateright.com
</td>
</tr></table>
<p>
<div>
<pre><a href="https://godbolt.org/z/zWf3xk">https://godbolt.org/z/zWf3xk</a>
Derived from (not direct copy of cpp source - alignment gets messed up):
#include <x86intrin.h>
void memcpy256_2_128_aligned(__m256 *src, __m256 *dst) {
auto x = _mm_load_ps((float*)src + 0);
auto y = _mm_load_ps((float*)src + 4);
_mm_stream_ps((float*)dst + 0, x);
_mm_stream_ps((float*)dst + 4, y);
}
define void @memcpy256_2_128_aligned(<8 x float>* noalias nocapture readonly,
<8 x float>* noalias nocapture) {
%3 = bitcast <8 x float>* %0 to <4 x float>*
%4 = load <4 x float>, <4 x float>* %3, align 32
%5 = getelementptr inbounds <8 x float>, <8 x float>* %0, i64 0, i64 4
%6 = bitcast float* %5 to <4 x float>*
%7 = load <4 x float>, <4 x float>* %6, align 16
%8 = bitcast <8 x float>* %1 to <4 x float>*
store <4 x float> %4, <4 x float>* %8, align 32, !nontemporal !0
%9 = getelementptr inbounds <8 x float>, <8 x float>* %1, i64 0, i64 4
%10 = bitcast float* %9 to <4 x float>*
store <4 x float> %7, <4 x float>* %10, align 16, !nontemporal !0
ret void
}
!0 = !{i32 1}
llc -mcpu=btver2
memcpy256_2_128_aligned: # @memcpy256_2_128_aligned
vmovaps (%rdi), %ymm0
vmovaps %ymm0, (%rsi) <-- SHOULD BE VMOVNTPS
retq
Several things need to be addressed:
1 - retain the nontemporal flag for merged stores
2 - don't merge stores if only some have a nontemporal flag
3 - only merges nontemporal if they are naturally aligned - unaligned nt-stores
are problematic (see [<a class="bz_bug_link
bz_status_NEW "
title="NEW - [X86] Scalarize non-temporal vector stores with poor alignment"
href="show_bug.cgi?id=42026">Bug #42026</a>])</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>