<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Global -> Constant promotion for atomics fails on platforms using intrinsics"
href="https://bugs.llvm.org/show_bug.cgi?id=37061">37061</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Global -> Constant promotion for atomics fails on platforms using intrinsics
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Scalar Optimizations
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>alex@crichton.co
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>First discovered in <a href="https://github.com/rust-lang/rust/issues/49775">https://github.com/rust-lang/rust/issues/49775</a> we've found
that LLVM will promote unmodified `global` definitions to `constant` through
the optimization passes. On some platforms, however, these constants may
actually be modified causing the promotion to cause a page fault at runtime
(modifying readonly memory).
The specific case we ran into was that on our Android configuration we've got
enough flags that disable atomic instruction generation and instead lowers down
to usage of the libgcc intrinsics for atomics. Namely we have a module like:
@FOO = internal unnamed_addr global <{ [4 x i8] }> zeroinitializer, align 4
define void @main() {
%a = load atomic i32, i32* bitcast (<{ [4 x i8] }>* @FOO to i32*) seq_cst,
align 4
ret void
}
Where when this is optimized with `/opt foo.ll -mtriple=arm-linux-androideabi
-mattr=+v5te,+strict-align -o - -S -O2` it will generate:
@FOO = internal unnamed_addr constant <{ [4 x i8] }> zeroinitializer, align 4
The assembly, however, generates:
main:
.type FOO,%object @ @FOO
.section .rodata.cst4,"aM",%progbits,4
.p2align 2
FOO:
.zero 4
.size FOO, 4
Albeit my assembly isn't super strong but `nm` confirms that `FOO` is indeed in
rodata rather than in bss like it originally would be. Unfortunately though the
assembly also makes use of __sync_val_compare_and_swap_4, an intrinsic in
libgcc. The intrinsic dispatches to __kuser_cmpxchg it looks like.
In our tests where we run inside the Android emulator it looks like the kernel
detects that the local "hardware" actually has atomic instructions so
__kuser_cmpxchg uses `ldrex` and `strexeq`. The `strexeq` instruction, however,
caues a page fault as it can't store the value back into `.rodata`
Some more information is at the end of the referenced issue at
<a href="https://github.com/rust-lang/rust/issues/49775#issuecomment-379851925">https://github.com/rust-lang/rust/issues/49775#issuecomment-379851925</a> but I was
wondering, is this something that we should be disabling locally? Or is this an
LLVM misoptimization?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>