<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - [Loop vectorizer] Run-time failures in test-suite with AVX512F"
href="https://llvm.org/bugs/show_bug.cgi?id=31671">31671</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[Loop vectorizer] Run-time failures in test-suite with AVX512F
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Loop Optimizer
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>elad2.cohen@intel.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=17849" name="attach_17849" title="test compiled with TOT clang -marh=knl -S -emit-llvm">attachment 17849</a> <a href="attachment.cgi?id=17849&action=edit" title="test compiled with TOT clang -marh=knl -S -emit-llvm">[details]</a></span>
test compiled with TOT clang -marh=knl -S -emit-llvm
The tests:
test-suite MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl
test-suite MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt
fail in runtime (wrong output) when built with AVX512 (and executed on KNL
target) - specifically the function s351() returns a wrong result.
Initially, it seems that is started failing due to the combination of r290810
and r284779. But these are actually just changes to the cost model and enabling
interleaved memory accesses by default for X86. Bisecting the erroneous
function while compiling with `force-vector-width` and
`enable-interleaved-mem-accesses` seems to lead to r282418.
I've attached the .ll files produced by compiling with -march=knl using TOT
clang VS TOT+revert-r282418.
The C code that contains the diff in IR is:
int s351() {
...
TYPE alpha = c[0];
for (int nl = 0; nl < 8*ntimes; nl++) {
for (int i = 0; i < LEN; i += 5) {
a[i] += alpha * b[i];
a[i + 1] += alpha * b[i + 1];
a[i + 2] += alpha * b[i + 2];
a[i + 3] += alpha * b[i + 3];
a[i + 4] += alpha * b[i + 4];
}
dummy(a, b, c, d, e, aa, bb, cc, 0.);
}
..
check(1);
return 0;
}
Putting aside the fact the the loop wasn't re-rolled, from the diff it looks
like the generated IR is doing something wrong(See that attached files for a
full context):
TOT:
%10 = add <16 x i64> %broadcast.splat, <i64 1, i64 6, i64 11, i64 16, i64 21,
i64 26, i64 31, i64 36, i64 41, i64 46, i64 51, i64 56, i64 61, i64 66, i64 71,
i64 76>
...
%VectorGep86 = getelementptr inbounds %struct.GlobalData, %struct.GlobalData*
@global_data, i64 0, i32 0, <16 x i64> %10
call void @llvm.masked.scatter.v16f32(<16 x float> %15, <16 x float*>
%VectorGep86
TOT+revert:
%10 = or i64 %offset.idx, 1
%broadcast.splatinsert86 = insertelement <16 x i64> undef, i64 %10, i32 0
%broadcast.splat87 = shufflevector <16 x i64> %broadcast.splatinsert86, <16 x
i64> undef, <16 x i32> zeroinitializer
...
%VectorGep88 = getelementptr inbounds %struct.GlobalData, %struct.GlobalData*
@global_data, i64 0, i32 0, <16 x i64> %broadcast.splat87
call void @llvm.masked.scatter.v16f32(<16 x float> %14, <16 x float*>
%VectorGep88
In the latter it seems that the scatter writes to a splat of the i+1 instead of
<i+1,i+6,i+11,..,i+76></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>