<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - The vectorization costs of SHL/SRL/SRA need to be adjusted to vectorize the loop in the testcase"

   href="https://llvm.org/bugs/show_bug.cgi?id=23582">23582</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>The vectorization costs of SHL/SRL/SRA need to be adjusted to vectorize the loop in the testcase

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Loop Optimizer

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>wmi@google.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>aschwaighofer@apple.com, llvmbugs@cs.uiuc.edu, nrotem@apple.com

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=14344" name="attach_14344" title="testcase 1.cc">attachment 14344</a> <a href="attachment.cgi?id=14344&action=edit" title="testcase 1.cc">[details]</a></span>

testcase 1.cc

For the testcase 1.cc attached, the kernel loop is not vectorized because the

vectorization costs of SHL/SRL/SRA are set to be a high value in SSE2CostTable

in X86TTIImpl::getArithmeticInstrCost. 

    { ISD::SHL,  MVT::v8i16,  8*10 }, // Scalarized.

    { ISD::SHL,  MVT::v4i32,  2*5 }, // We optimized this using mul.

    { ISD::SHL,  MVT::v2i64,  2*10 }, // Scalarized.

    { ISD::SRL,  MVT::v8i16,  8*10 }, // Scalarized.

    { ISD::SRL,  MVT::v4i32,  4*10 }, // Scalarized.

    { ISD::SRL,  MVT::v2i64,  2*10 }, // Scalarized.

    { ISD::SRA,  MVT::v8i16,  8*10 }, // Scalarized.

    { ISD::SRA,  MVT::v4i32,  4*10 }, // Scalarized.

But x86 supports psllw/pslld/psllq, psrlw/psrld/psrlq, psraw/psrad, I don't

understand why it is needed to set those costs to be so high. I saw it was

related with

<a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170439.html">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170439.html</a>.

But no testcase I can get from there.

The kernel loop in 1.cc:

        for (; k < f; k++) {

          m = h[k].ival;

          m += h[k + 1].ival;

          i[k].ival -= (l + n * m) >> j;

        }

~/workarea/llvm-r237614/build/bin/clang++ -std=c++11 -O2 -S 1.cc

The assembly:

.LBB0_19:                               # %for.body

        movswl  -2(%rdi), %ebp

        movswl  (%rdi), %ebx

        addl    %ebp, %ebx

        imull   %edx, %ebx

        addl    %esi, %ebx

        sarl    %cl, %ebx

        movzwl  (%rax), %ebp

        subl    %ebx, %ebp

        movw    %bp, (%rax)

        addq    $2, %rdi

        addq    $2, %rax

        decl    %r12d

        jne     .LBB0_19

For 1.cc, if we adjust SSE2CostTable to be:

    { ISD::SHL,  MVT::v8i16,  1 }, // Scalarized.

    { ISD::SHL,  MVT::v4i32,  1 }, // We optimized this using mul.

    { ISD::SHL,  MVT::v2i64,  1 }, // Scalarized.

    { ISD::SRL,  MVT::v8i16,  1 }, // Scalarized.

    { ISD::SRL,  MVT::v4i32,  1 }, // Scalarized.

    { ISD::SRL,  MVT::v2i64,  1 }, // Scalarized.

    { ISD::SRA,  MVT::v8i16,  1 }, // Scalarized.

    { ISD::SRA,  MVT::v4i32,  1 }, // Scalarized.

then the kernel loop in 1.cc can be vectorized very well. (For llvm after

r235455, it needs the patch here <a href="http://reviews.llvm.org/D9865">http://reviews.llvm.org/D9865</a> to generate the

good vectorization code)

.LBB0_19:                               # %vector.body

        xorps   %xmm3, %xmm3

        movss   %xmm2, %xmm3            # xmm3 = xmm2[0],xmm3[1,2,3]

        movq    -2(%rdx), %xmm4         # xmm4 = mem[0],zero

        punpcklwd       %xmm4, %xmm4    # xmm4 = xmm4[0,0,1,1,2,2,3,3]

        psrad   $16, %xmm4

        movq    (%rdx), %xmm5           # xmm5 = mem[0],zero

        punpcklwd       %xmm5, %xmm5    # xmm5 = xmm5[0,0,1,1,2,2,3,3]

        psrad   $16, %xmm5

        paddd   %xmm4, %xmm5

        pshufd  $245, %xmm5, %xmm4      # xmm4 = xmm5[1,1,3,3]

        pmuludq %xmm0, %xmm5

        pshufd  $232, %xmm5, %xmm5      # xmm5 = xmm5[0,2,2,3]

        pshufd  $245, %xmm0, %xmm6      # xmm6 = xmm0[1,1,3,3]

        pmuludq %xmm4, %xmm6

        pshufd  $232, %xmm6, %xmm4      # xmm4 = xmm6[0,2,2,3]

        punpckldq       %xmm4, %xmm5    # xmm5 =

xmm5[0],xmm4[0],xmm5[1],xmm4[1]

        paddd   %xmm1, %xmm5

        psrad   %xmm3, %xmm5

        movq    (%rsi), %xmm3           # xmm3 = mem[0],zero

        punpcklwd       %xmm7, %xmm3    # xmm3 =

xmm3[0],xmm7[0],xmm3[1],xmm7[1],xmm3[2],xmm7[2],xmm3[3],xmm7[3]

        psubw   %xmm5, %xmm3

        pshuflw $232, %xmm3, %xmm3      # xmm3 = xmm3[0,2,2,3,4,5,6,7]

        pshufhw $232, %xmm3, %xmm3      # xmm3 = xmm3[0,1,2,3,4,6,6,7]

        pshufd  $232, %xmm3, %xmm3      # xmm3 = xmm3[0,2,2,3]

        movq    %xmm3, (%rsi)

        addq    $8, %rdx

        addq    $8, %rsi

        addq    $-4, %rdi

        jne     .LBB0_19

The adjustment above can improved one of our benchmarks by 4%.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>