<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Vectorized horizontal reduction returning wrong result starting at r294934"
href="https://bugs.llvm.org/show_bug.cgi?id=32036">32036</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Vectorized horizontal reduction returning wrong result starting at r294934
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Loop Optimizer
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>andrew.b.adams@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=18015" name="attach_18015" title="ll that reproduces">attachment 18015</a> <a href="attachment.cgi?id=18015&action=edit" title="ll that reproduces">[details]</a></span>
ll that reproduces
The attached .ll generates code that computes different values before and after
r294934
The reduction it implements is roughly:
int val = 1;
for (int y = 0; y < 8; y++) {
for (int x = 0; x < 8; x++) {
val = val + input[y*8 + x] + 3;
}
}
The output value is precisely 168 = 7*8*3 times smaller than it ought to be, so
perhaps the +3 is getting lost on all but the first loop iteration?
The inner loop before that commit is:
movl (%r14,%rsi), %edx
movl 4(%r14,%rsi), %ebp
cmpl %r8d, %edx
cmovgl %r8d, %edx
addl (%rbx,%rsi), %eax
addl 4(%rbx,%rsi), %eax
cmpl %edx, %ebp
cmovlel %ebp, %edx
addl 8(%rbx,%rsi), %eax
movl 8(%r14,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
addl 12(%rbx,%rsi), %eax
movl 12(%r14,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
addl 16(%rbx,%rsi), %eax
movl 16(%r14,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
addl 20(%rbx,%rsi), %eax
movl 20(%r14,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
addl 24(%rbx,%rsi), %eax
movl 24(%r14,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
movl 28(%rbx,%rsi), %ebp
movl 28(%r14,%rsi), %r9d
cmpl %edx, %r9d
movl %edx, %r8d
cmovlel %r9d, %r8d
leal 24(%rbp,%rax), %eax
addq $32, %rsi
cmpq $256, %rsi
The loop over x has been fully unrolled. This inner loop is also computing a
min reduction, so ignore the cmovlel instructions. The relevant instructions
for the summation are:
addl (%rbx,%rsi), %eax
addl 4(%rbx,%rsi), %eax
addl 8(%rbx,%rsi), %eax
addl 12(%rbx,%rsi), %eax
addl 16(%rbx,%rsi), %eax
addl 20(%rbx,%rsi), %eax
addl 24(%rbx,%rsi), %eax
movl 28(%rbx,%rsi), %ebp
leal 24(%rbp,%rax), %eax
The first 7 values are added, and then the last value is loaded into ebp, and
then added using an leal, with the constant term (8*3 = 24) accounted for in
the leal. This is correct.
The inner loop after that commit is:
movl (%rbx,%rsi), %edx
movl 4(%rbx,%rsi), %r13d
cmpl %r8d, %edx
cmovgl %r8d, %edx
vmovdqu (%r14,%rsi), %ymm0
cmpl %edx, %r13d
cmovlel %r13d, %edx
movl 8(%rbx,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
movl 12(%rbx,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
movl 16(%rbx,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
movl 20(%rbx,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
movl 24(%rbx,%rsi), %ebp
cmpl %edx, %ebp
cmovlel %ebp, %edx
movl 28(%rbx,%rsi), %ebp
cmpl %edx, %ebp
movl %edx, %r8d
cmovlel %ebp, %r8d
vextracti128 $1, %ymm0, %xmm1
vpaddd %ymm1, %ymm0, %ymm0
vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1]
vpaddd %ymm1, %ymm0, %ymm0
vphaddd %ymm0, %ymm0, %ymm0
vmovd %xmm0, %edx
leal 3(%rdx,%rax), %eax
addq $32, %rsi
cmpq $256, %rsi # imm = 0x100
The relevant instructions for the summation are:
vmovdqu (%r14,%rsi), %ymm0
vextracti128 $1, %ymm0, %xmm1
vpaddd %ymm1, %ymm0, %ymm0
vpshufd $78, %xmm0, %xmm1 # xmm1 = xmm0[2,3,0,1]
vpaddd %ymm1, %ymm0, %ymm0
vphaddd %ymm0, %ymm0, %ymm0
vmovd %xmm0, %edx
leal 3(%rdx,%rax), %eax
All values are loaded at once into a vector, and then they are horizontally
reduced into a single value, to which 3 is added using leal. Ah-hah! That
should be 24, not 3.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>