<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW "
title="NEW - [x86] Vector shift left generates sub-optimal code for shift by "select" from two constants or loop-invariant values"
href="https://bugs.llvm.org/show_bug.cgi?id=37428">37428</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>[x86] Vector shift left generates sub-optimal code for shift by "select" from two constants or loop-invariant values
</td>
</tr>
<tr>
<th>Product</th>
<td>libraries
</td>
</tr>
<tr>
<th>Version</th>
<td>6.0
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>enhancement
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>Backend: X86
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>fabiang@radgametools.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvm-bugs@lists.llvm.org
</td>
</tr></table>
<p>
<div>
<pre>void variable_shift_left_loop(unsigned int *arr, const bool *control, int
count, int amt0, int amt1)
{
for (int i = 0; i < count; ++i)
{
int amt = control[i] ? amt1 : amt0;
arr[i] = arr[i] << amt;
}
}
Clang 6.0 targeting x86-64, "-O2 -msse4.1".
The generated code uses this instruction sequence to produce (1 << amt), which
is then in turn used as input to a multiply to perform the left shift (since
pre-AVX2 x86 does not have per-lane variable shift instructions):
blendvps %xmm0, %xmm3, %xmm6
pslld $23, %xmm6
paddd %xmm9, %xmm6 # xmm9 = [0x3f800000 repeated 4 times]
cvttps2dq %xmm6, %xmm0
amt0 and amt1 are loop-invariant, so it would be possible to compute (1<<amt0)
and (1<<amt1) once outside the loop, and then perform the vector select between
the two constants, saving 3 instructions for every 4-vector of integers
processed.
There's a more general pattern here where it might be beneficial to transform
temp = select(cond, loop_invariant_a, loop_invariant_b)
temp2 = pure_func(temp) // single use of temp
result = op(var, temp2)
into
// outside loop:
func_of_a = pure_func(loop_invariant_a)
func_of_b = pure_func(loop_invariant_b)
// inside loop:
temp = select(cond, func_of_a, func_of_b)
result = op(var, temp)
This is particularly helpful when the values being selected between are not
just loop-invariant, but constants (e.g. the above loop with amt0 and amt1
replaced with two literals).</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>