[LLVMbugs] [Bug 21943] New: [AVX/AVX2] Inefficient vector shuffle lowering: 5 instructions instead of one movhps|d
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Wed Dec 17 15:32:18 PST 2014
http://llvm.org/bugs/show_bug.cgi?id=21943
Bug ID: 21943
Summary: [AVX/AVX2] Inefficient vector shuffle lowering: 5
instructions instead of one movhps|d
Product: libraries
Version: trunk
Hardware: PC
OS: All
Status: NEW
Severity: normal
Priority: P
Component: Backend: X86
Assignee: unassignedbugs at nondot.org
Reporter: qcolombet at apple.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
Created attachment 13558
--> http://llvm.org/bugs/attachment.cgi?id=13558&action=edit
IR to reproduce the problem.
Tested with trunk r224470.
In the attached IR we fail to recognize a movhp pattern, i.e., res =
input1[0,1], input2[0,1].
Instead we generate a long sequence of vector shuffle to produce the desired
output.
Interestingly, this problem happens only when AVX and/or AVX2 are enabled. SSE
lowering works just fine.
** To Reproduce **
llc -mtriple=x86_64-apple-macosx avx_movhps.ll -o - -mattr=+avx[2]
** Result **
The output gives the lowering of 3 different functions that do the exact same
thing but with different IR. Both the second (baz) and third (bar) functions
are canonicalized on the IR of the second function if run through opt.
Right now, llc gives the correct result only for the third one (i.e., the
non-cannonicalized one).
_foo: ## @foo
.cfi_startproc
## BB#0: ## %for.body
vmovq (%rsi), %xmm0
vmovq (%rdi), %xmm1
vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]
retq
.cfi_endproc
.globl _baz
.align 4, 0x90
_baz: ## @baz
.cfi_startproc
## BB#0: ## %for.body
vmovq (%rsi), %xmm0
vmovq (%rdi), %xmm1
vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
vinsertps $32, %xmm0, %xmm1, %xmm1 ## xmm1 = xmm1[0,1],xmm0[0],xmm1[3]
vpermilps $-27, %xmm0, %xmm0 ## xmm0 = xmm0[1,1,2,3]
vinsertps $48, %xmm0, %xmm1, %xmm0 ## xmm0 = xmm1[0,1,2],xmm0[0]
retq
.cfi_endproc
.globl _bar
.align 4, 0x90
_bar: ## @bar
.cfi_startproc
## BB#0: ## %for.body
vmovq (%rdi), %xmm0
vmovhpd (%rsi), %xmm0, %xmm0
retq
.cfi_endproc
For the record, here is the assembly without -mattr:
_foo: ## @foo
.cfi_startproc
## BB#0: ## %for.body
movq (%rdi), %xmm0
movhpd (%rsi), %xmm0
retq
.cfi_endproc
.globl _baz
.align 4, 0x90
_baz: ## @baz
.cfi_startproc
## BB#0: ## %for.body
movq (%rdi), %xmm0
movhpd (%rsi), %xmm0
retq
.cfi_endproc
.globl _bar
.align 4, 0x90
_bar: ## @bar
.cfi_startproc
## BB#0: ## %for.body
movq (%rdi), %xmm0
movhpd (%rsi), %xmm0
retq
.cfi_endproc
** Note **
Interestingly, the first two instructions of the current lowering produce the
identity:
vpermilps $-27, %xmm1, %xmm2 ## xmm2 = xmm1[1,1,2,3]
vinsertps $16, %xmm2, %xmm1, %xmm1 ## xmm1 = xmm1[0],xmm2[0],xmm1[2,3]
In other words, we shouldn't emit them even if we were not able to grab the
movhpd|s pattern.
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20141217/faa3941c/attachment.html>
More information about the llvm-bugs
mailing list