<html>
    <head>
      <base href="https://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - The vectorization costs of SHL/SRL/SRA need to be adjusted to vectorize the loop in the testcase"
   href="https://llvm.org/bugs/show_bug.cgi?id=23582">23582</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>The vectorization costs of SHL/SRL/SRA need to be adjusted to vectorize the loop in the testcase
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>wmi@google.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>aschwaighofer@apple.com, llvmbugs@cs.uiuc.edu, nrotem@apple.com
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=14344" name="attach_14344" title="testcase 1.cc">attachment 14344</a> <a href="attachment.cgi?id=14344&action=edit" title="testcase 1.cc">[details]</a></span>
testcase 1.cc

For the testcase 1.cc attached, the kernel loop is not vectorized because the
vectorization costs of SHL/SRL/SRA are set to be a high value in SSE2CostTable
in X86TTIImpl::getArithmeticInstrCost. 

    { ISD::SHL,  MVT::v8i16,  8*10 }, // Scalarized.
    { ISD::SHL,  MVT::v4i32,  2*5 }, // We optimized this using mul.
    { ISD::SHL,  MVT::v2i64,  2*10 }, // Scalarized.

    { ISD::SRL,  MVT::v8i16,  8*10 }, // Scalarized.
    { ISD::SRL,  MVT::v4i32,  4*10 }, // Scalarized.
    { ISD::SRL,  MVT::v2i64,  2*10 }, // Scalarized.

    { ISD::SRA,  MVT::v8i16,  8*10 }, // Scalarized.
    { ISD::SRA,  MVT::v4i32,  4*10 }, // Scalarized.

But x86 supports psllw/pslld/psllq, psrlw/psrld/psrlq, psraw/psrad, I don't
understand why it is needed to set those costs to be so high. I saw it was
related with
<a href="http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170439.html">http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20130401/170439.html</a>.
But no testcase I can get from there.

The kernel loop in 1.cc:
        for (; k < f; k++) {
          m = h[k].ival;
          m += h[k + 1].ival;
          i[k].ival -= (l + n * m) >> j;
        }

~/workarea/llvm-r237614/build/bin/clang++ -std=c++11 -O2 -S 1.cc
The assembly:
.LBB0_19:                               # %for.body
        movswl  -2(%rdi), %ebp
        movswl  (%rdi), %ebx
        addl    %ebp, %ebx
        imull   %edx, %ebx
        addl    %esi, %ebx
        sarl    %cl, %ebx
        movzwl  (%rax), %ebp
        subl    %ebx, %ebp
        movw    %bp, (%rax)
        addq    $2, %rdi
        addq    $2, %rax
        decl    %r12d
        jne     .LBB0_19

For 1.cc, if we adjust SSE2CostTable to be:
    { ISD::SHL,  MVT::v8i16,  1 }, // Scalarized.
    { ISD::SHL,  MVT::v4i32,  1 }, // We optimized this using mul.
    { ISD::SHL,  MVT::v2i64,  1 }, // Scalarized.

    { ISD::SRL,  MVT::v8i16,  1 }, // Scalarized.
    { ISD::SRL,  MVT::v4i32,  1 }, // Scalarized.
    { ISD::SRL,  MVT::v2i64,  1 }, // Scalarized.

    { ISD::SRA,  MVT::v8i16,  1 }, // Scalarized.
    { ISD::SRA,  MVT::v4i32,  1 }, // Scalarized.

then the kernel loop in 1.cc can be vectorized very well. (For llvm after
r235455, it needs the patch here <a href="http://reviews.llvm.org/D9865">http://reviews.llvm.org/D9865</a> to generate the
good vectorization code)

.LBB0_19:                               # %vector.body
        xorps   %xmm3, %xmm3
        movss   %xmm2, %xmm3            # xmm3 = xmm2[0],xmm3[1,2,3]
        movq    -2(%rdx), %xmm4         # xmm4 = mem[0],zero
        punpcklwd       %xmm4, %xmm4    # xmm4 = xmm4[0,0,1,1,2,2,3,3]
        psrad   $16, %xmm4
        movq    (%rdx), %xmm5           # xmm5 = mem[0],zero
        punpcklwd       %xmm5, %xmm5    # xmm5 = xmm5[0,0,1,1,2,2,3,3]
        psrad   $16, %xmm5
        paddd   %xmm4, %xmm5
        pshufd  $245, %xmm5, %xmm4      # xmm4 = xmm5[1,1,3,3]
        pmuludq %xmm0, %xmm5
        pshufd  $232, %xmm5, %xmm5      # xmm5 = xmm5[0,2,2,3]
        pshufd  $245, %xmm0, %xmm6      # xmm6 = xmm0[1,1,3,3]
        pmuludq %xmm4, %xmm6
        pshufd  $232, %xmm6, %xmm4      # xmm4 = xmm6[0,2,2,3]
        punpckldq       %xmm4, %xmm5    # xmm5 =
xmm5[0],xmm4[0],xmm5[1],xmm4[1]
        paddd   %xmm1, %xmm5
        psrad   %xmm3, %xmm5
        movq    (%rsi), %xmm3           # xmm3 = mem[0],zero
        punpcklwd       %xmm7, %xmm3    # xmm3 =
xmm3[0],xmm7[0],xmm3[1],xmm7[1],xmm3[2],xmm7[2],xmm3[3],xmm7[3]
        psubw   %xmm5, %xmm3
        pshuflw $232, %xmm3, %xmm3      # xmm3 = xmm3[0,2,2,3,4,5,6,7]
        pshufhw $232, %xmm3, %xmm3      # xmm3 = xmm3[0,1,2,3,4,6,6,7]
        pshufd  $232, %xmm3, %xmm3      # xmm3 = xmm3[0,2,2,3]
        movq    %xmm3, (%rsi)
        addq    $8, %rdx
        addq    $8, %rsi
        addq    $-4, %rdi
        jne     .LBB0_19

The adjustment above can improved one of our benchmarks by 4%.</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>