<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - Use derefenceable info to replace branch with select"
href="https://bugs.llvm.org/show_bug.cgi?id=43003">43003</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>Use derefenceable info to replace branch with select
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>trunk
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Linux
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Loop Optimizer
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>david.bolvansky@gmail.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>void foo(int arr[static 1024]) {
for (int i = 0; i < 1024; ++i) {
if (arr[i] < 0)
arr[i] = 0;
}
}
void foo2(int arr[static 1024]) {
for (int i = 0; i < 1024; ++i) {
arr[i] = arr[i] < 0 ? 0 : arr[i];
}
}
void foo3(int arr[1024]) {
for (int i = 0; i < 1024; ++i) {
arr[i] = arr[i] < 0 ? 0 : arr[i];
}
}
void foo4(int arr[1024]) {
for (int i = 0; i < 1024; ++i) {
if (arr[i] < 0)
arr[i] = 0;
}
}
Replace:
if (arr[i] < 0)
arr[i] = 0;
with: arr[i] = arr[i] < 0 ? 0 : arr[i];
If we know that 'i' is always in dereferenceable range of 'arr'.
'dereferenceable' says that we can write to memory (it is not a read only
memory).
in 'foo', we know i32* nocapture dereferenceable(4096) %arr and we know i is in
range <0, 1024), so we should be able to replace branch with a elect.
I am not sure about 'foo4' case, we can do nothing I think since 'arr' could
be read only.
Motivation? Clang -O3 -mavx2
Replace:
foo: # @foo
xor eax, eax
vpxor xmm0, xmm0, xmm0
.LBB0_1: # =>This Inner Loop Header: Depth=1
vmovdqu ymm1, ymmword ptr [rdi + rax]
vmovdqu ymm2, ymmword ptr [rdi + rax + 32]
vmovdqu ymm3, ymmword ptr [rdi + rax + 64]
vmovdqu ymm4, ymmword ptr [rdi + rax + 96]
vpmaskmovd ymmword ptr [rdi + rax], ymm1, ymm0
vpmaskmovd ymmword ptr [rdi + rax + 32], ymm2, ymm0
vpmaskmovd ymmword ptr [rdi + rax + 64], ymm3, ymm0
vpmaskmovd ymmword ptr [rdi + rax + 96], ymm4, ymm0
vmovdqu ymm1, ymmword ptr [rdi + rax + 128]
vmovdqu ymm2, ymmword ptr [rdi + rax + 160]
vmovdqu ymm3, ymmword ptr [rdi + rax + 192]
vmovdqu ymm4, ymmword ptr [rdi + rax + 224]
vpmaskmovd ymmword ptr [rdi + rax + 128], ymm1, ymm0
vpmaskmovd ymmword ptr [rdi + rax + 160], ymm2, ymm0
vpmaskmovd ymmword ptr [rdi + rax + 192], ymm3, ymm0
vpmaskmovd ymmword ptr [rdi + rax + 224], ymm4, ymm0
add rax, 256
cmp rax, 4096
jne .LBB0_1
vzeroupper
ret
With a really nice code:
foo2: # @foo2
xor eax, eax
vpxor xmm0, xmm0, xmm0
.LBB1_1: # =>This Inner Loop Header: Depth=1
vpmaxsd ymm1, ymm0, ymmword ptr [rdi + 4*rax]
vpmaxsd ymm2, ymm0, ymmword ptr [rdi + 4*rax + 32]
vpmaxsd ymm3, ymm0, ymmword ptr [rdi + 4*rax + 64]
vpmaxsd ymm4, ymm0, ymmword ptr [rdi + 4*rax + 96]
vmovdqu ymmword ptr [rdi + 4*rax], ymm1
vmovdqu ymmword ptr [rdi + 4*rax + 32], ymm2
vmovdqu ymmword ptr [rdi + 4*rax + 64], ymm3
vmovdqu ymmword ptr [rdi + 4*rax + 96], ymm4
add rax, 32
cmp rax, 1024
jne .LBB1_1
vzeroupper
ret
I believe simple case like:
void bar(int arr[static 1024]) {
if (arr[80] < 0)
arr[80] = 0;
}
could handle SimplifyCFG and turn branch to select.
Suprisingly:
void bar(int *arr) {
if (arr[80] < 7)
arr[80] = 7;
}
ICC generates cmovge.. ICC ignores the fact that 'arr' could be read only?
The motivation loop case I dont know.. maybe IndVars?</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>