<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [x86] Vector shift left generates sub-optimal code for shift by "select" from two constants or loop-invariant values"
   href="https://bugs.llvm.org/show_bug.cgi?id=37428">37428</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[x86] Vector shift left generates sub-optimal code for shift by "select" from two constants or loop-invariant values
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>6.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Windows NT
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>fabiang@radgametools.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>void variable_shift_left_loop(unsigned int *arr, const bool *control, int
count, int amt0, int amt1)
{
    for (int i = 0; i < count; ++i)
    {
        int amt = control[i] ? amt1 : amt0;
        arr[i] = arr[i] << amt;
    }
}

Clang 6.0 targeting x86-64, "-O2 -msse4.1".

The generated code uses this instruction sequence to produce (1 << amt), which
is then in turn used as input to a multiply to perform the left shift (since
pre-AVX2 x86 does not have per-lane variable shift instructions):

  blendvps %xmm0, %xmm3, %xmm6
  pslld $23, %xmm6
  paddd %xmm9, %xmm6 # xmm9 = [0x3f800000 repeated 4 times]
  cvttps2dq %xmm6, %xmm0

amt0 and amt1 are loop-invariant, so it would be possible to compute (1<<amt0)
and (1<<amt1) once outside the loop, and then perform the vector select between
the two constants, saving 3 instructions for every 4-vector of integers
processed.

There's a more general pattern here where it might be beneficial to transform

   temp = select(cond, loop_invariant_a, loop_invariant_b)
   temp2 = pure_func(temp) // single use of temp
   result = op(var, temp2)

into

   // outside loop:
   func_of_a = pure_func(loop_invariant_a)
   func_of_b = pure_func(loop_invariant_b)

   // inside loop:
   temp = select(cond, func_of_a, func_of_b)
   result = op(var, temp)

This is particularly helpful when the values being selected between are not
just loop-invariant, but constants (e.g. the above loop with amt0 and amt1
replaced with two literals).</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>