<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Missed optimization : Loop unrolling causes inefficient use of `adc` as compared to loop rolling"
href="https://bugs.llvm.org/show_bug.cgi?id=44460">44460</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Missed optimization : Loop unrolling causes inefficient use of `adc` as compared to loop rolling
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>9.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>madhur4127@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Consider emulating 192-bit integer using a 128-bit integer and a 64-bit
integer. In the code sample this emulated integer is used to compute dot
product of two uint64_t vectors of length N.
// function to compute dot product of two vectors
using u128 = unsigned __int128;
const int N = 2048;
uint64_t a[N], b[N];
u128 sum = 0;
uint64_t overflow = 0;
for(int i=0;i<N;++i){
u128 prod = (u128) a[i] * (u128) b[i];
sum += prod;
// gcc branches, clang just uses: adc overflow, 0
overflow += sum<prod;
}
To check for overflow in 128-bit and subsequently propagate the carry to
`overflow`, `adc` can be used. This idiom works well when loops are rolled
(no-unroll).
clang++ -O3 -Wall -Wextra -march=broadwell -fno-unroll-loops
.LBB0_1: # =>This Inner Loop Header: Depth=1
mov rax, qword ptr [rsi + 8*rcx]
mul qword ptr [rdi + 8*rcx]
add r10, rax
adc r9, rdx
adc r11, 0 # This is efficient form
inc rcx
cmp rcx, 2048
jne .LBB0_1
mov qword ptr [r8], r11
mov rax, r10
mov rdx, r9
ret
------
But when loops are unrolled this efficient ASM degrades to `mov; setb; movzx;
add;` Instead of just `adc reg, 0`.
clang++ -O3 -Wall -Wextra -march=broadwell # fno-unroll-loops is absent
.LBB0_1: # =>This Inner Loop Header: Depth=1
mov rax, qword ptr [rsi + 8*rbx]
mov r10, qword ptr [rsi + 8*rbx + 8]
mul qword ptr [rdi + 8*rbx]
mov r11, rdx
mov r14, rax
add r14, r9
adc r11, rcx
setb bpl
mov rax, r10
mul qword ptr [rdi + 8*rbx + 8]
mov rcx, rax
mov r9, rdx
movzx ebp, bpl
add rcx, r14
adc r9, r11
adc rbp, r15
mov rax, qword ptr [rsi + 8*rbx + 16]
mul qword ptr [rdi + 8*rbx + 16]
mov r10, rdx
mov r11, rax
add r11, rcx
adc r10, r9
setb cl
mov rax, qword ptr [rsi + 8*rbx + 24]
mul qword ptr [rdi + 8*rbx + 24]
movzx r15d, cl
mov r9, rax
add r9, r11
mov rcx, rdx
adc rcx, r10
adc r15, rbp
add rbx, 4
cmp rbx, 2048
jne .LBB0_1
For complete source code, here is the godbolt link:
<a href="https://godbolt.org/z/tT7Z2H">https://godbolt.org/z/tT7Z2H</a>
Source of this discussion is the stackoverflow Q&A:
<a href="https://stackoverflow.com/q/59575408/8199790">https://stackoverflow.com/q/59575408/8199790</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>