[LLVMbugs] [Bug 21943] New: [AVX/AVX2] Inefficient vector shuffle lowering: 5 instructions instead of one movhps|d

bugzilla-daemon at llvm.org bugzilla-daemon at llvm.org
Wed Dec 17 15:32:18 PST 2014


http://llvm.org/bugs/show_bug.cgi?id=21943

            Bug ID: 21943
           Summary: [AVX/AVX2] Inefficient vector shuffle lowering: 5
                    instructions instead of one movhps|d
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: All
            Status: NEW
          Severity: normal
          Priority: P
         Component: Backend: X86
          Assignee: unassignedbugs at nondot.org
          Reporter: qcolombet at apple.com
                CC: llvmbugs at cs.uiuc.edu
    Classification: Unclassified

Created attachment 13558
  --> http://llvm.org/bugs/attachment.cgi?id=13558&action=edit
IR to reproduce the problem.

Tested with trunk r224470.

In the attached IR we fail to recognize a movhp pattern, i.e., res =
input1[0,1], input2[0,1].
Instead we generate a long sequence of vector shuffle to produce the desired
output.

Interestingly, this problem happens only when AVX and/or AVX2 are enabled. SSE
lowering works just fine.


** To Reproduce **

llc -mtriple=x86_64-apple-macosx avx_movhps.ll -o - -mattr=+avx[2]


** Result **

The output gives the lowering of 3 different functions that do the exact same
thing but with different IR. Both the second (baz) and third (bar) functions
are canonicalized on the IR of the second function if run through opt.
Right now, llc gives the correct result only for the third one (i.e., the
non-cannonicalized one).

_foo:                                   ## @foo
    .cfi_startproc
## BB#0:                                ## %for.body
    vmovq    (%rsi), %xmm0
    vmovq    (%rdi), %xmm1
    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
    vinsertps    $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
    vpermilps    $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
    vinsertps    $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]
    retq
    .cfi_endproc

    .globl    _baz
    .align    4, 0x90
_baz:                                   ## @baz
    .cfi_startproc
## BB#0:                                ## %for.body
    vmovq    (%rsi), %xmm0
    vmovq    (%rdi), %xmm1
    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
    vinsertps    $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
    vpermilps    $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
    vinsertps    $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]
    retq
    .cfi_endproc

    .globl    _bar
    .align    4, 0x90
_bar:                                   ## @bar
    .cfi_startproc
## BB#0:                                ## %for.body
    vmovq    (%rdi), %xmm0
    vmovhpd    (%rsi), %xmm0, %xmm0
    retq
    .cfi_endproc

For the record, here is the assembly without -mattr:
_foo:                                   ## @foo
    .cfi_startproc
## BB#0:                                ## %for.body
    movq    (%rdi), %xmm0
    movhpd    (%rsi), %xmm0
    retq
    .cfi_endproc

    .globl    _baz
    .align    4, 0x90
_baz:                                   ## @baz
    .cfi_startproc
## BB#0:                                ## %for.body
    movq    (%rdi), %xmm0
    movhpd    (%rsi), %xmm0
    retq
    .cfi_endproc

    .globl    _bar
    .align    4, 0x90
_bar:                                   ## @bar
    .cfi_startproc
## BB#0:                                ## %for.body
    movq    (%rdi), %xmm0
    movhpd    (%rsi), %xmm0
    retq
    .cfi_endproc

** Note **

Interestingly, the first two instructions of the current lowering produce the
identity:
    vpermilps    $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
    vinsertps    $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
In other words, we shouldn't emit them even if we were not able to grab the
movhpd|s pattern.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141217/faa3941c/attachment.html>


More information about the llvm-bugs mailing list