[llvm-bugs] [Bug 37428] New: [x86] Vector shift left generates sub-optimal code for shift by "select" from two constants or loop-invariant values

via llvm-bugs llvm-bugs at lists.llvm.org
Fri May 11 15:07:26 PDT 2018


https://bugs.llvm.org/show_bug.cgi?id=37428

            Bug ID: 37428
           Summary: [x86] Vector shift left generates sub-optimal code for
                    shift by "select" from two constants or loop-invariant
                    values
           Product: libraries
           Version: 6.0
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: fabiang at radgametools.com
                CC: llvm-bugs at lists.llvm.org

void variable_shift_left_loop(unsigned int *arr, const bool *control, int
count, int amt0, int amt1)
{
    for (int i = 0; i < count; ++i)
    {
        int amt = control[i] ? amt1 : amt0;
        arr[i] = arr[i] << amt;
    }
}

Clang 6.0 targeting x86-64, "-O2 -msse4.1".

The generated code uses this instruction sequence to produce (1 << amt), which
is then in turn used as input to a multiply to perform the left shift (since
pre-AVX2 x86 does not have per-lane variable shift instructions):

  blendvps %xmm0, %xmm3, %xmm6
  pslld $23, %xmm6
  paddd %xmm9, %xmm6 # xmm9 = [0x3f800000 repeated 4 times]
  cvttps2dq %xmm6, %xmm0

amt0 and amt1 are loop-invariant, so it would be possible to compute (1<<amt0)
and (1<<amt1) once outside the loop, and then perform the vector select between
the two constants, saving 3 instructions for every 4-vector of integers
processed.

There's a more general pattern here where it might be beneficial to transform

   temp = select(cond, loop_invariant_a, loop_invariant_b)
   temp2 = pure_func(temp) // single use of temp
   result = op(var, temp2)

into

   // outside loop:
   func_of_a = pure_func(loop_invariant_a)
   func_of_b = pure_func(loop_invariant_b)

   // inside loop:
   temp = select(cond, func_of_a, func_of_b)
   result = op(var, temp)

This is particularly helpful when the values being selected between are not
just loop-invariant, but constants (e.g. the above loop with amt0 and amt1
replaced with two literals).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20180511/9cccea8a/attachment-0001.html>


More information about the llvm-bugs mailing list