<html>
<head>
<base href="http://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW --- - @llvm.uadd.with.overflow.i32 (a.k.a. __builtin_addc) intrinsic produces worse-code than non-intrinsic version"
href="http://llvm.org/bugs/show_bug.cgi?id=20748">20748</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>@llvm.uadd.with.overflow.i32 (a.k.a. __builtin_addc) intrinsic produces worse-code than non-intrinsic version
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Common Code Generator Code
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>oneill+llvmbugs@cs.hmc.edu
</td>
</tr>
<tr>
<th>CC</th>
<td>llvmbugs@cs.uiuc.edu
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=12934" name="attach_12934" title="Intrinsic vs. nonintrinsic add-with-carry">attachment 12934</a> <a href="attachment.cgi?id=12934&action=edit" title="Intrinsic vs. nonintrinsic add-with-carry">[details]</a></span>
Intrinsic vs. nonintrinsic add-with-carry
LLVM and Clang claim to provide intrinsics that efficiently support
multiprecision arithmetic, described here
<a href="http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins">http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins</a>
but the actual code produced is poor, and is actually *worse* than the code
LLVM produces if we hand code an equivalent function to the intrinsic.
For example, consider the attached code, which is based on code at the above
URL. The version using the LLVM intrinsic produces:
_addc4: ## @addc4
.cfi_startproc
## BB#0: ## %entry
pushq %rbp
Ltmp3:
.cfi_def_cfa_offset 16
Ltmp4:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp5:
.cfi_def_cfa_register %rbp
movl (%rdi), %eax
addl (%rsi), %eax
sbbl %ecx, %ecx
andl $1, %ecx
movl %eax, (%rdx)
movl 4(%rdi), %eax
addl 4(%rsi), %eax
sbbb %r8b, %r8b
addl %ecx, %eax
sbbb %cl, %cl
orb %r8b, %cl
andb $1, %cl
movzbl %cl, %r8d
movl %eax, 4(%rdx)
movl 8(%rdi), %eax
addl 8(%rsi), %eax
sbbb %r9b, %r9b
addl %r8d, %eax
sbbb %cl, %cl
orb %r9b, %cl
andb $1, %cl
movzbl %cl, %ecx
movl %eax, 8(%rdx)
movl 12(%rsi), %eax
addl 12(%rdi), %eax
addl %ecx, %eax
movl %eax, 12(%rdx)
popq %rbp
retq
.cfi_endproc
with not an addc instruction in sight! (It *could* have been compiled down to
an add and three adc instructions.)
In contrast, if we compile with -DOVERRIDE_INTRINSIC, we get
_addc4: ## @addc4
.cfi_startproc
## BB#0: ## %entry
pushq %rbp
Ltmp3:
.cfi_def_cfa_offset 16
Ltmp4:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp5:
.cfi_def_cfa_register %rbp
movl (%rsi), %r8d
movl (%rdi), %ecx
leal (%r8,%rcx), %eax
movl %eax, (%rdx)
movl 4(%rdi), %r9d
movl 4(%rsi), %eax
addl %r9d, %eax
addl %r8d, %ecx
adcl $0, %eax
movl %eax, 4(%rdx)
movl 8(%rdi), %r8d
movl 8(%rsi), %ecx
addl %r8d, %ecx
cmpl %r9d, %eax
adcl $0, %ecx
movl %ecx, 8(%rdx)
movl 12(%rsi), %eax
addl 12(%rdi), %eax
cmpl %r8d, %ecx
adcl $0, %eax
movl %eax, 12(%rdx)
popq %rbp
retq
.cfi_endproc
which is still fairly poor code because it could optimize the adcl $0 into a
previous add, but is still way better than the intrinsic version.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>