<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - frontend optimization of constant aggregate initializers can pessimize final code"
href="https://bugs.llvm.org/show_bug.cgi?id=34972">34972</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>frontend optimization of constant aggregate initializers can pessimize final code
</td>
</tr>
<tr>
<th>Product</th>
<td>clang
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>All
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>-New Bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedclangbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>benjamin@benjamin.pe
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>InstCombine knows some good tricks for constant aggregates. It can eliminate
copies of read-only aggregates onto the stack as well as turn table lookups
into inline comparisons. Unfortunately, the way clang emits aggregate constants
can sometimes foil these optimizations.
To illustrate, here is a very silly function for checking whether a char
parameter is 3:
int equals_3(char c) {
char table[] = {0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0};
return c >= 0 && table[(unsigned char)c];
}
This generates the IR (under -O3):
; Function Attrs: nounwind readnone uwtable
define i32 @equals_3(i8 signext) local_unnamed_addr #0 {
%2 = alloca [128 x i8], align 16
%3 = getelementptr inbounds [128 x i8], [128 x i8]* %2, i64 0, i64 0
call void @llvm.lifetime.start.p0i8(i64 128, i8* nonnull %3) #2
call void @llvm.memset.p0i8.i64(i8* nonnull %3, i8 0, i64 128, i32 16, i1
false)
%4 = getelementptr inbounds [128 x i8], [128 x i8]* %2, i64 0, i64 3
store i8 1, i8* %4, align 1
%5 = icmp sgt i8 %0, -1
br i1 %5, label %6, label %12
; <label>:6: ; preds = %1
%7 = zext i8 %0 to i64
%8 = getelementptr inbounds [128 x i8], [128 x i8]* %2, i64 0, i64 %7
%9 = load i8, i8* %8, align 1, !tbaa !2
%10 = icmp ne i8 %9, 0
%11 = zext i1 %10 to i32
br label %12
; <label>:12: ; preds = %6, %1
%13 = phi i32 [ 0, %1 ], [ %11, %6 ]
call void @llvm.lifetime.end.p0i8(i64 128, i8* nonnull %3) #2
ret i32 %13
}
clang has decided to emit the table as a memset of the stack followed by a
store. This obfuscates the initialization enough that InstCombine won't improve
the code.
Amusingly, by inverting the table we can evade the memset "optimization" and
get excellent final code:
int equals_3(char c) {
char table[] = {1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1};
return c >= 0 && table[(unsigned char)c];
}
define i32 @equals_3(i8 signext) local_unnamed_addr #0 {
%2 = icmp eq i8 %0, 3
%3 = zext i1 %2 to i32
ret i32 %3
}
Making the table a const variable also generates good code because the frontend
emits the table directly as a static global (which incidentally can be
non-conforming; see <a class="bz_bug_link
bz_status_NEW "
title="NEW - non-conforming optimization -fmerge-all-constants is enabled by default"
href="show_bug.cgi?id=18538">bug 18538</a>).</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>