<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - [AVX/AVX2] Inefficient vector shuffle lowering: 5 instructions instead of one movhps|d"

   href="http://llvm.org/bugs/show_bug.cgi?id=21943">21943</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[AVX/AVX2] Inefficient vector shuffle lowering: 5 instructions instead of one movhps|d

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>qcolombet@apple.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=13558" name="attach_13558" title="IR to reproduce the problem.">attachment 13558</a> <a href="attachment.cgi?id=13558&action=edit" title="IR to reproduce the problem.">[details]</a></span>

IR to reproduce the problem.

Tested with trunk r224470.

In the attached IR we fail to recognize a movhp pattern, i.e., res =

input1[0,1], input2[0,1].

Instead we generate a long sequence of vector shuffle to produce the desired

output.

Interestingly, this problem happens only when AVX and/or AVX2 are enabled. SSE

lowering works just fine.

** To Reproduce **

llc -mtriple=x86_64-apple-macosx avx_movhps.ll -o - -mattr=+avx[2]

** Result **

The output gives the lowering of 3 different functions that do the exact same

thing but with different IR. Both the second (baz) and third (bar) functions

are canonicalized on the IR of the second function if run through opt.

Right now, llc gives the correct result only for the third one (i.e., the

non-cannonicalized one).

_foo:                                   ## @foo

    .cfi_startproc

## BB#0:                                ## %for.body

    vmovq    (%rsi), %xmm0

    vmovq    (%rdi), %xmm1

    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]

    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]

    vinsertps    $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]

    vpermilps    $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]

    vinsertps    $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

    retq

    .cfi_endproc

    .globl    _baz

    .align    4, 0x90

_baz:                                   ## @baz

    .cfi_startproc

## BB#0:                                ## %for.body

    vmovq    (%rsi), %xmm0

    vmovq    (%rdi), %xmm1

    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]

    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]

    vinsertps    $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]

    vpermilps    $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]

    vinsertps    $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]

    retq

    .cfi_endproc

    .globl    _bar

    .align    4, 0x90

_bar:                                   ## @bar

    .cfi_startproc

## BB#0:                                ## %for.body

    vmovq    (%rdi), %xmm0

    vmovhpd    (%rsi), %xmm0, %xmm0

    retq

    .cfi_endproc

For the record, here is the assembly without -mattr:

_foo:                                   ## @foo

    .cfi_startproc

## BB#0:                                ## %for.body

    movq    (%rdi), %xmm0

    movhpd    (%rsi), %xmm0

    retq

    .cfi_endproc

    .globl    _baz

    .align    4, 0x90

_baz:                                   ## @baz

    .cfi_startproc

## BB#0:                                ## %for.body

    movq    (%rdi), %xmm0

    movhpd    (%rsi), %xmm0

    retq

    .cfi_endproc

    .globl    _bar

    .align    4, 0x90

_bar:                                   ## @bar

    .cfi_startproc

## BB#0:                                ## %for.body

    movq    (%rdi), %xmm0

    movhpd    (%rsi), %xmm0

    retq

    .cfi_endproc

** Note **

Interestingly, the first two instructions of the current lowering produce the

identity:

    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]

    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]

In other words, we shouldn't emit them even if we were not able to grab the

movhpd|s pattern.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>