<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Flag -Oz produces larger binary than -Os"
href="https://bugs.llvm.org/show_bug.cgi?id=46801">46801</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Flag -Oz produces larger binary than -Os
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: ARM
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>p.waydan@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org, smithp352@googlemail.com, Ties.Stuij@arm.com
</td>
</tr></table>
<p>
<div>
<pre>Created <span class=""><a href="attachment.cgi?id=23764" name="attach_23764" title="[llvm-dev] [ARM] Should Use Load and Store with Register Offset">attachment 23764</a> <a href="attachment.cgi?id=23764&action=edit" title="[llvm-dev] [ARM] Should Use Load and Store with Register Offset">[details]</a></span>
[llvm-dev] [ARM] Should Use Load and Store with Register Offset
While trying different memcpy implementations, I found that compiling the
following code with -Oz will increase the binary when compared with -Os.
typedef unsigned int size_t;
void* memcpy(void* dst, const void* src, size_t len) {
char* save = (char*)dst;
while(--len != (size_t)(-1))
*((char*)(dst + len)) = *((char*)(src + len));
return save;
}
Common compile options passed to clang are -S --target=armv6m-none-eabi
-fomit-frame-pointer
Output with -Os
memcpy:
push {r4, lr}
cmp r2, #0
beq .LBB1_3
subs r3, r0, #1
subs r1, r1, #1
.LBB1_2:
ldrb r4, [r1, r2]
strb r4, [r3, r2]
subs r2, r2, #1
bne .LBB1_2
.LBB1_3:
pop {r4, pc}
Output with -Oz
memcpy:
push {r4, r5, r7, lr}
subs r1, r1, #1
movs r3, #0
mvns r3, r3
.LBB1_1:
cmp r2, #0
beq .LBB1_3
subs r4, r2, #1
ldrb r5, [r1, r2]
adds r2, r0, r2
strb r5, [r2, r3]
mov r2, r4
b .LBB1_1
.LBB1_3:
pop {r4, r5, r7, pc}
The above memcpy implementation copies bytes starting at the high address.
Interestingly, when using a similar implementation which copies bytes starting
at the low address, -Oz reduces code size compared to -Os.
For reference: this code was compiled with clang and llvm built from source
(commit 16a4350f76d2bead7af32617dd557d2ec096d2c5)</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>