<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - missed opt: target-feature propagation"
href="https://bugs.llvm.org/show_bug.cgi?id=41138">41138</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>missed opt: target-feature propagation
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>All
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Scalar Optimizations
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>gonzalobg88@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>Minimal working example - this Rust code:
extern "C" {
#[target_feature(enable = "avx2")] pub fn foo();
}
pub unsafe fn bar() { foo() }
generates the following LLVM-IR (<a href="https://rust.godbolt.org/z/fzcraS">https://rust.godbolt.org/z/fzcraS</a>):
define void @bar() unnamed_addr #0 {
tail call void @foo()
ret void
}
declare void @foo() unnamed_addr #1
attributes #0 = { nounwind nonlazybind uwtable
"probe-stack"="__rust_probestack" "target-cpu"="x86-64" }
attributes #1 = { nounwind nonlazybind uwtable
"probe-stack"="__rust_probestack" "target-cpu"="x86-64"
"target-features"="+avx2" }
which `opt` does not optimize further (<a href="https://rust.godbolt.org/z/XgMyCJ">https://rust.godbolt.org/z/XgMyCJ</a>).
Note that `foo` has the "target-features"="+avx2", but this is not propagated
to `bar`, which can significantly impact code generation and other
optimizations (e.g. if `bar` contained loops, those could use AVX2
instructions).
Propagating `avx2` to `bar` in this case is sound, because if `foo` is called
on a platform without `avx2` support, the behavior is undefined. That is, we
can assume that `foo` will only be called on platforms where `avx2` is enabled.
In general, if a function is unconditionally called, we can propagate its
target-features to the caller. If a function is only conditionally called, more
complex analysis is required, e.g., for this code
extern "C" {
#[target_feature(enable = "avx2")] pub fn foo();
#[target_feature(enable = "avx2")] pub fn baz();
}
pub unsafe fn bar(x: i32) {
if x == 0 { foo() } else { baz() }
}
which produces this LLVM-IR (<a href="https://rust.godbolt.org/z/XT4Hpo">https://rust.godbolt.org/z/XT4Hpo</a>):
define void @bar(i32 %x) unnamed_addr #0 {
%0 = icmp eq i32 %x, 0
br i1 %0, label %bb1, label %bb2
bb1: ; preds = %start
tail call void @foo()
br label %bb3
bb2: ; preds = %start
tail call void @baz()
br label %bb3
bb3: ; preds = %bb2, %bb1
ret void
}
declare void @foo() unnamed_addr #1
declare void @baz() unnamed_addr #1
attributes #0 = { nounwind nonlazybind uwtable
"probe-stack"="__rust_probestack" "target-cpu"="x86-64" }
attributes #1 = { nounwind nonlazybind uwtable
"probe-stack"="__rust_probestack" "target-cpu"="x86-64"
"target-features"="+avx2" }
The optimization is also sound: `bar` should also have the `+avx2`
target-feature.</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>